Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT/Calculate business days elapsed based on two inputs #31

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,23 @@ This macro only seeks to add or update the tags which are specified in dbt. It w
If you need this behaviour, it usually comes naturally as dbt drops and recreates tables/views for most materializations.
If you are using the incremental materialization, be aware of this limitation.

### snowflake_utils.create_udfs ([source](macros/create_udfs.sql))
This macro will create the udfs listed in the macro (found in [udfs](macros/udfs) folder) into the target schema on-run-start. Once these UDFs are created, this allows the user to call UDFs within their SQL queries/models as part of your dbt project.

#### Arguments
None: The [on-run-end context object](https://docs.getdbt.com/reference/dbt-jinja-functions/on-run-end-context).

#### Usage

The macro must be called as part of on-run-start, so add the following to dbt_project.yml:
```
on-run-end: "{{ create_udfs() }}"
```
#### UDFs currently generated
* udf_business_days
* udf_business_hours

See individual yaml files within [udfs](macros/udfs/) folder to learn more about syntax and use cases.

----

Expand Down
8 changes: 8 additions & 0 deletions macros/create_udfs.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{% macro create_udfs() %}

create schema if not exists {{target.schema}};

{{ create_udf_business_days() }};
{{ create_udf_business_hours() }};

{% endmacro %}
30 changes: 29 additions & 1 deletion macros/macros.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,4 +123,32 @@ This macro only seeks to add or update the tags which are specified in dbt. It w
If you need this behaviour, it usually comes naturally as dbt drops and recreates tables/views for most materializations.
If you are using the incremental materialization, be aware of this limitation.

{% enddocs %}
{% enddocs %}

{% docs create_udfs %}

Create_udfs is a macro that does exactly as it says, calls the individiual
UDFs that are nested in macros/udfs and makes them available to use in
your dbt transformations within Snowflake.

In order to facilitate the use of UDFs within your project, you must add
the following to your dbt_project.yml file;

on-run-start:
- '{{ create_udfs() }}'

Users must have the proper permissions to create UDFs, orchestrated
through the following command as an owner/administrator:
GRANT USAGE ON LANGUAGE PLPYTHONU TO <user>

Why/motivation:
While many complex transformations are possible with just SQL, creating
UDFs allows a project to begin encorporating python packages
to make certain transformations incredibly easy and efficient.
Additionally, leveraging UDFs via dbt macros allows for:
- the ability to version control,
- the ability to read transformations from your project without needing
to interact with Snowflake
- to maintain separate dev/prod versions of the UDFs.

{% enddocs %}
5 changes: 4 additions & 1 deletion macros/macros.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,7 @@ macros:
description: '{{ doc("apply_meta_as_tags") }}'
arguments:
- name: results
description: The on-run-end context object
description: The on-run-end context object

- name: create_udfs
description: '{{ doc("create_udfs") }}'
91 changes: 91 additions & 0 deletions macros/udfs/business_days.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
{% macro create_udf_business_days() %}

create or replace function {{target_schema}}.udf_business_days(start_datetime DATE, end_datetime DATE, weekmask STRING, country STRING)
returns int
language python
runtime_version = 3.8
packages = ('numpy','holidays')
handler = 'business_days_py'
as

$$
import numpy as np
import holidays as holidays

def business_days_py(start_date, end_date, weekmask, country):
# calculate business days between two dates
years = [*range(1990,2030)]

holiday_list = list(holidays.country_holidays(country, years=years))

return np.busday_count(
start_date,
end_date,
weekmask = weekmask,
holidays = holiday_list
)

$$

-- overload file to default weekmask as Monday to Friday (weekmask='1111100')
create or replace function {{target_schema}}.udf_business_days(start_datetime DATE, end_datetime DATE, country STRING)
returns int
language python
runtime_version = 3.8
packages = ('numpy','holidays')
handler = 'business_days_py'
as

$$
import numpy as np
import holidays as holidays

def business_days_py(start_date, end_date, country):

# calculate business days between two dates
years = [*range(1990,2030)]
weekmask='1111100'

holiday_list = list(holidays.country_holidays(country, years=years))

return np.busday_count(
start_date,
end_date,
weekmask = weekmask,
holidays = holiday_list
)

$$

-- finally, overload function to default weekmask as M-F and country as 'US'
create or replace function {{target_schema}}.udf_business_days(start_datetime DATE, end_datetime DATE)
returns int
language python
runtime_version = 3.8
packages = ('numpy','holidays')
handler = 'business_days_py'
as

$$
import numpy as np
import holidays as holidays

def business_days_py(start_date, end_date):

# calculate business days between two dates
years = [*range(1990,2030)]
weekmask='1111100'
country='US'

holiday_list = list(holidays.country_holidays(country, years=years))

return np.busday_count(
start_date,
end_date,
weekmask = weekmask,
holidays = holiday_list
)

$$

{% endmacro %}
49 changes: 49 additions & 0 deletions macros/udfs/business_days.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
version: 2

macros:
- name: create_udf_business_days
description: >
This macro creates a UDF within the target schema that can be used to
calculate business days (excluding holidays) between two date fields
in a Snowflake table. The macro mearly creates the UDF within Snowflake,
and is called by the create_udfs macro. You must ensure that create_udfs
is called on-run-start in dbt_project.yml.

To utilize the UDF within your dbt project, proper syntax is:

select
*,
udf_business_days(start_date, end_date, country, weekmask)
as business_days_elapsed
from sample_table

Notes:
- The date fields passed must be of type date or datetime (timestamp_ntz).
- The country field is optional as it is defaulted to 'US'. This can be a
column within your table if you have businesses in different geographic
locations, or can be set as a string for the entire dataset. If looking
to use a different country, must use the ISO 3166-1 alpha-2 country code
as defined here:
https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes
- The weekmask field is a string indicating which days of the week are to
be considered business days, and defaults to the standard workweek being
Monday through Friday via weekmask = '1111100'; the weekmask starting
with Monday and ending with Sunday, with a 1 indicating viable business
day and 0 indicating a non-business day. This field is optional.
- Regarding defaulted arguments,calls can be made with the following
syntax:
udf_business_days(start_date, end_date, country, weekmask)
udf_business_days(start_date, end_date, country)
udf_business_days(start_date, end_date)
- This UDF only considers holidays between Jan 1 1990 and Dec 31 2030, as
is currently hardcoded. This can be changed in the macro source code.
- If increased granularity is desired for the region, you must modify the
macro to apply regional (state or province) holidays as described as
defined here:
holiday_list = holidays.country_holidays(country, subdiv='PR')
- The holidays python package does not include the end date in its
calculations of the duration. To account for this, you can add a day to
your end date to make the calculation inclusive.

docs:
show: false
Loading