How to Load a Date Dimension in Microsoft Fabric using Macros in dbt?

Comments 0

Share to social media

With the popular data transformation tool dbt (data build tool), we have a bunch of interesting features at our disposal to write SQL more efficiently. One of those features is macros, which we’ll introduce in this article (don’t worry; they’re not related to the Excel VBA macros of your nightmares).

The article A gentle introduction to dbt explains how you can get dbt in the cloud version, how you can set up a free account, and how to create a connection to a Microsoft Fabric warehouse. The article Loading Models from Source Data with dbt expands this setup further by showing how you can define your source tables (loading source tables is not possible in dbt, the tool can only do transformations using SQL) and how to load some sample dimension and fact tables. It’s recommended to go through these articles first if you haven’t already because we will build upon them.

Using Macros to Create a Date Dimension

Every data warehouse typically has a date dimension. Since almost all data analysis involves time in some manner, it’s probably the most important dimension in your data warehouse project. There are plenty of online resources on how to create an SQL SELECT statement that will generate a date table. In this article, we’ll go one step further and create one using reusable logic. In dbt, we can accomplish this with packages and macros.

Installing a Package into your dbt Project

There are numerous packages available for dbt. Packages are extensible logic that you can import into your dbt project, very similar to extensions in Visual Studio Code for example, or modules/packages in Python. You can find a list of available packages at hub.getdbt.com, and like dbt they are open-source. One widely used package is dbt-util, which offers a range of useful macros. To install a package, we need to add a packages.yml file to the project.

A screenshot of a computer

AI-generated content may be incorrect.

Then create a new packages.yml file:

A screenshot of a computer

AI-generated content may be incorrect.

Inside this file, we can specify the following configuration:

A screenshot of a computer

AI-generated content may be incorrect.

This will include the dbt_utils package with the specific version 1.3.0. If you don’t care about a specific version and rather want the latest version of a specific release, you can use the following YAML code:

In the command line at the bottom of the screen, we now need to run the command dbt deps. This command will install the most recent versions of the dependencies listed in the packages.yml file.

A screenshot of a computer

AI-generated content may be incorrect.

The package is now installed and its macros can be used in your dbt models.

Creating the Date Dimension

The package dbt_util has a useful macro called date_spine. It will take a start and end date and a time interval such as day (equivalent to datepart parameters in T-SQL). The macro will generate a list of dates between the start and end date. It’s quite similar to how we use tally tables in SQL Server.

Let’s add a new file to the dbt project containing the model for our DimDate.

A screenshot of a computer

AI-generated content may be incorrect.



Right click and create a new file:

A screenshot of a computer

AI-generated content may be incorrect.

We can generate the list of dates using the date_spine macro with the following code (which is SQL embedded with Jinja):

However, when we build our model, we get the following error:

A screenshot of a computer

AI-generated content may be incorrect.



Turns out the date_spine macro from the dbt_utils package uses ORDER BY inside another function called GENERATE_SERIES, which is used by date_spine. This might work on other data platforms, but T-SQL doesn’t allow it. This brings us to a problem that can occur when working with 3rd party packages: not all of them will be compliant with Microsoft Fabric.

You can implement platform-specific code within a macro (the database adaptor within dbt will then use the correct implementation). Still, for Fabric, there are no such implementations yet in the dbt-utils package, which is inconvenient. dbt-utils is widely used within the dbt community.

There are Fabric-specific utility macros within the dbt-fabric package, but at the time of writing there’s no date_spine equivalent. However, not to worry; we can write our own macro!

Creating our First Macro

A macro in dbt is a piece of reusable SQL code. It’s one of the most powerful features in dbt, and it’s basically “dynamic SQL on steroids”. Let’s illustrate with an example. In the macros folder, we create a new file called my_date_spine.sql:

A screenshot of a computer

AI-generated content may be incorrect.

Inside the file, we put the following code, which is again a combination of SQL and Jinja:

We use the built-in T-SQL functions GENERATE_SERIES and DATEDIFF to generate our list of dates. In the first Jinja line, we define the name of the macro and its input parameters (start_date and end_date). Inside the SQL code, we can reference these parameters by using the double curly brackets.

When we go back to the date dimension, we can call the macro like this:

When we compile the code, we can see dbt has put the SQL code from the macro inside our model, and any parameter references are replaced as well:

A screenshot of a computer program

AI-generated content may be incorrect.

This is the power of macros within dbt. They finally make SQL code actually reusable without the performance hits that we tend to get with SQL Server user-defined functions.

If we hit preview, we can see we get the list of dates we need for our date dimension:

A screenshot of a computer program

AI-generated content may be incorrect.

Now, we can finish our SELECT statement by using date functions to calculate the typical columns of a date table: year, quarter, month, week, and so on.

You could join against reference tables that contain the holidays for each year to expand the date dimension or add extra columns for fiscal dates.

Conclusion

In this article, we saw how installing packages can expand the functionality of debt. These give us access to additional macros, which are useful tools for reusing SQL functionality. You can always write your own macro, which we did to generate a list of dates for our date dimension.

Article tags

Load comments

About the author

Koen Verbeeck

See Profile

Koen Verbeeck is a data professional working at AE. He helps organizations to get insight in their data and to improve their analytics solutions. Koen has over a decade of experience in developing data warehouses, models and reports using the Microsoft data platform. Since 2017 Koen is a Microsoft Data Platform MVP. He has a blog at http://www.sqlkover.com, writes articles for MSSQLTips.com and is a frequent speaker at Microsoft Data Platform events.