Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facts dimensions week2 #56

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Analytics engineering with dbt

Template repository for the projects and environment of the course: Analytics engineering with dbt
First project submission modeled on the greenery data model.
Next Steps are to create a new branch and rework the source and staging models

> Please note that this sets some environment variables so if you create some new terminals please load them again.

Expand Down
4 changes: 4 additions & 0 deletions greenery/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

target/
dbt_packages/
logs/
Empty file added greenery/Project_2_Readme.md
Empty file.
15 changes: 15 additions & 0 deletions greenery/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Welcome to your new dbt project!

### Using the starter project

Try running the following commands:
- dbt run
- dbt test


### Resources:
- Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction)
- Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers
- Join the [chat](https://community.getdbt.com/) on Slack for live discussions and support
- Find [dbt events](https://events.getdbt.com) near you
- Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices
Empty file added greenery/analyses/.gitkeep
Empty file.
53 changes: 53 additions & 0 deletions greenery/analyses/_analysis_project1.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
--How many users do we have?
SELECT COUNT(DISTINCT USER_ID) FROM DEV_DB.DBT_COKERBUSAYO.STG_POSTGRES_USERS;

--On average, how many orders do we receive per hour?

WITH orders_per_hour AS (
SELECT HOUR(CREATED_AT)
,COUNT(DISTINCT ORDER_ID) AS NUM_OF_ORDERS
FROM DEV_DB.DBT_COKERBUSAYO.STG_POSTGRES_ORDERS
GROUP BY HOUR(CREATED_AT)
)

SELECT AVG(NUM_OF_ORDERS)
FROM orders_per_hour;

--On average, how long does an order take from being placed to being delivered?
WITH orders_delivered AS (
SELECT ORDER_ID
,CREATED_AT
,DELIVERED_AT
,TIMESTAMPDIFF('minutes', CREATED_AT, DELIVERED_AT) AS DELIVERY_TIME
FROM DEV_DB.DBT_COKERBUSAYO.STG_POSTGRES_ORDERS
WHERE DELIVERED_AT IS NOT NULL
)

SELECT ROUND((AVG(DELIVERY_TIME)/60),2)
FROM orders_delivered;

--How many users have only made one purchase? Two purchases? Three+ purchases?
WITH customer_purchases AS (
SELECT USER_ID
,COUNT(DISTINCT ORDER_ID) AS NUM_ORDERS
FROM DEV_DB.DBT_COKERBUSAYO.STG_POSTGRES_ORDERS
GROUP BY USER_ID
)

SELECT SUM(CASE WHEN NUM_ORDERS = 1 THEN 1 ELSE 0 END) AS one
,SUM(CASE WHEN NUM_ORDERS = 2 THEN 1 ELSE 0 END) AS two
,SUM(CASE WHEN NUM_ORDERS >= 3 THEN 1 ELSE 0 END) AS three_or_more
FROM customer_purchases;

--On average, how many unique sessions do we have per hour?
WITH sessions_per_hour AS (
SELECT HOUR(CREATED_AT) AS SESSION_HOUR
,COUNT(DISTINCT SESSION_ID) AS NUM_SESSIONS
FROM DEV_DB.DBT_COKERBUSAYO.STG_POSTGRES_EVENTS
GROUP BY HOUR(CREATED_AT)
)

SELECT AVG(NUM_SESSIONS)
FROM sessions_per_hour;

select distinct event_type from DEV_DB.DBT_COKERBUSAYO.STG_POSTGRES_EVENTS;
17 changes: 17 additions & 0 deletions greenery/analyses/analysis_project3.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
What is our overall conversion rate?
select
count(distinct session_id) as sessions,
count(distinct order_id) as orders,
orders/sessions * 100 as conversion_rate
from fct_conversion;


What is our conversion rate by product?

select
product_id,
count(distinct session_id) as sessions,
count(distinct order_id) as orders,
orders/sessions * 100 as conversion_rate
from fct_conversion
group by 1;
42 changes: 42 additions & 0 deletions greenery/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@

# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'greenery'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'greenery'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
- "target"
- "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/ directory
# as tables. These settings can be overridden in the individual model files
# using the `{{ config(...) }}` macro.
models:
greenery:
# Config indicated by + and applies to all files under models/staging/
staging:
+materialized: view
marts:
+materialized: table
+post-hook:
- grant select on {{ this }} to role reporting
Empty file added greenery/macros/.gitkeep
Empty file.
21 changes: 21 additions & 0 deletions greenery/macros/event_type_counts.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{%- macro event_type_counts(ref_table, ref_column, count_column) -%}

{% set list = dbt_utils.get_column_values(
table = ref( ref_table ),
column = ref_column
) %}

{%- for list_item in list -%}
count(
case
when {{ ref_column }} = '{{ list_item }}'
then {{ count_column }}
else null
end
) as {{list_item}}_count
{%- if not loop.last -%}
,
{%- endif -%}
{%- endfor -%}

{%- endmacro -%}
6 changes: 6 additions & 0 deletions greenery/macros/lbs_to_kgs.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{% macro lbs_to_kgs(column_name, precision=2) %}
ROUND(
(CASE WHEN {{ column_name }} = -99 THEN NULL ELSE {{ column_name }} END / 2.205)::NUMERIC,
{{ precision }}
)
{% endmacro %}
9 changes: 9 additions & 0 deletions greenery/macros/positive_values.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{% test positive_values(model, column_name) %}


select *
from {{ model }}
where {{ column_name }} < 0


{% endtest %}
27 changes: 27 additions & 0 deletions greenery/models/example/my_first_dbt_model.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@

/*
Welcome to your first dbt model!
Did you know that you can also configure models directly within SQL files?
This will override configurations stated in dbt_project.yml
Try changing "table" to "view" below
*/

{{ config(materialized='table') }}

with source_data as (

select 1 as id
union all
select null as id

)

select *
from source_data

/*
Uncomment the line below to remove records with null `id` values
*/

-- where id is not null
6 changes: 6 additions & 0 deletions greenery/models/example/my_second_dbt_model.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@

-- Use the `ref` function to select from other models

select *
from {{ ref('my_first_dbt_model') }}
where id = 1
21 changes: 21 additions & 0 deletions greenery/models/example/schema.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@

version: 2

models:
- name: my_first_dbt_model
description: "A starter dbt model"
columns:
- name: id
description: "The primary key for this table"
tests:
- unique
- not_null

- name: my_second_dbt_model
description: "A starter dbt model"
columns:
- name: id
description: "The primary key for this table"
tests:
- unique
- not_null
52 changes: 52 additions & 0 deletions greenery/models/marts/core/fact_dims/_core_models.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
version: 2

models:
- name: dim_users
description: >
Denormalied user-grained table with aggregations information for key
facts.
columns:
- name: user_id
tests:
- not_null
- unique

- name: dim_date
description: >
Generic Date dimension starting from the first day of the earliest month in
our source data and going until 1 year after today.
columns:
- name: date_day
tests:
- not_null
- unique

- name: dim_products
description: >
Denormalized product-grained table with aggregation information for key
facts.
columns:
- name: product_id
tests:
- not_null
- unique

- name: fct_orders
description: >
Since there is only 1 source for orders information, the orders fact will
be nearly identical to the orders staging table with additional calculations.
columns:
- name: order_id
tests:
- not_null
- unique

- name: fct_events
description: >
Since there is only 1 source for events information, the events fact will
be nearly identical to the events staging table with additional calculations.
columns:
- name: event_id
tests:
- not_null
- unique
32 changes: 32 additions & 0 deletions greenery/models/marts/core/fact_dims/dim_date.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{{
config(
materialized='table'
)
}}

/*
DIM_DATE:
Generic Date dimension starting from the first day of the earliest month
in our source data and going until 1 year after today.
*/

with date_spine as (
{{ dbt_utils.date_spine(
datepart = "day",
start_date = "cast('2021-02-01' as date)",
end_date = "cast(dateadd('year', 1, current_date()) as date)"
)
}}
),

results as (
select
date_day,
date_part('year', date_day) as year,
date_part('month', date_day) as month,
date_part('day', date_day) as day

from date_spine
)

select * from results
42 changes: 42 additions & 0 deletions greenery/models/marts/core/fact_dims/dim_products.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
{{
config(
materialized='table'
)
}}

with product_source as (
select * from {{ ref('stg_postgres_products') }}
),

product_events as (
select * from {{ ref('stg_postgres_events') }}
),

product_orders as (
select * from {{ ref('stg_postgres_order_items') }}
),

results as (
select
-- Primary Key
p.product_id,
p.name,
p.price,

-- Aggregated Order Information
count(distinct o.order_id) as total_orders,
sum(o.quantity) as total_sold,

-- Aggregated Event Information
count(distinct e.event_id) as total_events

from product_source p
left join product_orders o
on p.product_id = o.product_id
left join product_events e
on p.product_id = e.product_id

{{ dbt_utils.group_by(n=3) }}
)

select * from results
Loading