How to automate table level refresh in Power BI

Comments 0

Share to social media

The refresh schedule on the Power BI portal is made at the Dataset level. This means all the tables refresh on the same schedule. Some models may not have a problem with this, but many models will. This article explains how to automate table level refresh in Power BI.

This refresh schedule means you will be creating a bigger workload than you really need compared to a refresh at the table level if it were possible.

There are some options to ignore and work-around this, and there is one option which will require more effort but can solve the problem. This article will analyse these options and go deeper into how to build custom refresh automation solutions.

In this article, I will also refer to many solutions previously published in other blog posts and explain where they fit when trying to solve big problems. Some you may already know; others will be completely new for you. Be prepared for a long journey of new knowledge which may change your point of view about managing Power BI refreshes.

Why customize Power BI refresh

Here are some reasons to customize the refresh in Power BI:

  • Schedule the refresh on table level according to the frequency each table needs
  • Go beyond the refresh schedule limit in Power BI
  • Create refreshes by partitions. This is especially important when using incremental refresh. (This will be a subject for a different article.)

Ignore the refresh frequency problem

In many scenarios, scheduling all refreshes at once will not be a big deal. The tables which could live with a less frequent refresh schedule are probably small dimension tables, and the workload created by their refresh is not that big.

That’s why in many scenarios it’s safe to ignore this problem; it’s not a completely wrong option.

Use dataflows to break down the ETL process

One solution is to use dataflows to break down the tables according to their refresh needs. Each dataflow can have its own refresh schedule, solving the workload problem at the data source.

One single dataset can load the data from multiple dataflows. However, each table in the dataset will need to have its own storage mode. The solution is to use import storage mode on all the tables coming from the dataflow.

The bad side: This means the data will be duplicated, on the dataflows and on the datasets. You will also have many refresh schedules to deal with, one for each dataflow, what was expected, and one for the dataset.

The good side: On the other hand, the workload to the data source will be very precisely configured according to the needs. There will be a second workload, but it will be inside Power BI, between the dataset and dataflows, so it will be less critical.

You may be wondering why not to use Direct Query. It’s simple: Performance. The relationships between tables using import mode will result in better performance for the queries than using direct query mode.

The solution

There is no way to refresh one single table from the UI, but you can refresh a single table or even a single partition using the XMLA (XML for Analysis) connection to Power BI. The XMLA connection is a Power BI connection endpoint, open to any developer who would like to build a tool and connect to it. Microsoft allow connecting to the XMLA endpoint using SQL Server Management Studio (SSMS) You can also connect to the XMLA endpoint using the Tabular Editor.

Both tools can refresh individual objects interactively. They can also generate scripts for the refresh. However, they can’t schedule the refresh, it’s always an interactive task.

You can schedule a recurring execution of the generated script. In order to do so, use an Azure Automation Account. Using the Automation Account allows scheduling a PowerShell script to connect to Power BI and refresh the table.

The Power BI connection is made by using an identity from the same tenant. You will need to register an application to provide us with this identity.

Generating the refresh script

The first step is to generate the refresh script, or just copy it from below. Here are the steps to generate the refresh script.

  1. On the Power BI Portal, navigate to a workspace and open Workspace Settings.
  2. Select Premium

The XMLA endpoint is only available on PPU or Premium subscriptions

Image showing the workspace settings. The Workspace Connection is highlighted

  1. Copy the XMLA endpoint address (Workspace Connection in the dialog)
  2. Open SSMS
  3. In Object Explorer window, select Connect-> Analysis services

It’s possible you don’t need to do that, if the Connect to Server window is already open. Only change the Server type to Analysis Services

An image showing the Object Explorer of SSMS. Analysis Services is selected under Connect

  1. Paste the XMLA endpoint as the server’s name
  2. Choose Azure Active Directory – Universal with MFA authentication

An image showing SSMS connecting to the Power BI workspace

  1. Type the e-mail of your Power BI login
  2. Click the Connect button. You will be requested to authenticate
  3. On object explorer window, open the database. Each dataset on the workspace will appear as a different database.

An image showing the Object Explorer and connected to Power BI. The Demo 5 database is selected and shows two tables

  1. Right click the table you would like to refresh
  2. Select the Process Table menu option

An image showing Product is selected and the right-click menu

  1. On the Mode drop down, select the Process Full mode

There are many different modes you can use for different scenarios. You can check additional details here.

An image showing the Process Table(s) dialog. Process Full is selected. The Product table is checked.

  1. Select the table you would like to process. It’s interesting to notice you can choose multiple tables and create one single script to process multiple tables together, if they need the same refresh frequency.
  1. Using the Script button, select Script Action to a New Query Window

An image showing the Process Table(s) dialog. The Script menu shows Script Action to New Query Window

  1. Click the Cancel button. You don’t need to process it interactively.

A script like the one below will be generated:

You may notice it’s not XML, it’s JSON. XMLA processing can accept JSON format as well.

Side Option: Interactive refresh using the Tabular Model

An alternate option for this situation, although it doesn’t meet the goal for in this article, is to use the Tabular Editor to generate and execute scripts.

Tabular Editor can generate XMLA scripts but can’t execute them. On the other hand, it can generate C# scripts using TOM (Tabular Object Model) and execute the script.

There are three interesting references about this:

  • This article explains how to generate scripts to make multiple updates in batch using the Tabular Editor.
  • This article explains an application created to be executed inside the Tabular Editor which helps to easily generate the refresh scripts.
  • This video summarizes many options about interactively refreshing tables using Tabular Editor.

These are interesting options, but in this article, I will proceed with the automation of a PowerShell script.

Generating an identity for authentication

The PowerShell script will need an identity to authenticate to Power BI. You create this identity using Azure AD, and you will need to register an application in Azure AD.

There is another blog post explaining how to do it in detail for Power BI, so I will leave you with this link.

Installing the ADOMD library

PowerShell is the script language I will use for this example. However, this is not enough. You need to use a client library to connect to the XMLA endpoint. Some examples are ADOMD, AMO or TOM.

I will illustrate the example with ADOMD. It’s enough to execute a XMLA script and I believe it will be more familiar to most developers, since it uses the ADO model (Connection/Command).

You can learn more about these client libraries on the following links:

This script used in this example requires the ADOMD Library, and the library must be installed from the machine where it will run. In this case, you will test it locally before scheduling it with the Automation Account.

You can access the link and use the installer for the library; it will be enough for a local execution.

The configuration for the ADOMD library to be used in an Automation Account will require some additional steps. I will talk about this further in this article.

Creating the PowerShell script for the refresh

Below you can find the complete PowerShell script you can execute using the PowerShell ISE. Here are some details of this script:

  • The connection string contains the XMLA endpoint for the workspace, but it also contains an initial catalog. The Initial Catalog contains the name of a dataset. Each dataset on the workspace behaves as a database and you will be connecting to one specific dataset.
  • The service principal (using its format: AppId@TenantId) is included in the connection string as the username, and the password is the secret value.
  • ADOMD contains the traditional ADO objects: Connection/Command/Adapter/DataReader. For this example, Connection and Command are enough.
  • The script uses Command’s ExecuteNonQuery method. This method is used when you want to execute something, but you don’t want to bring any information back.
  • The script opens the connection at the last possible moment and closes it as soon as possible. This is old school good practice.
  • The script calls the Connection’s Dispose method, a good practice used with .NET objects.

This is the PowerShell script. You can copy/paste it in PowerShell ISE and execute, and it should work after you modify it for your environment.

Checking the results

In the Power BI portal, the refresh will appear as if it was a dataset refresh. It will appear as the last refresh date of the dataset. This happens even if you open the Refresh Summary on the Admin Portal page. The view is always by dataset.

 

An image showing the Refresh Summary for Demo5. The data was refreshed Today, 12:55 pm and 12:01 pm

Power BI has the information about the last refresh date, but you need to extract the information through the XMLA endpoint. You can do that using SSMS.

There are two ways to get this information:

  • Checking the table properties in SSMS. SSMS brings the last refresh date. You can check this on both tables, and you will see the date will be different, because the refresh was done only in a single table
  • Querying the system tables. Power BI behaves as an SSAS server, and it also has system tables. There are limitations about what data can be retrieved, but they are very useful.

Querying Power BI System Tables

The queries to system tables can be built in the MDX window, but they are neither MDX nor SQL. They are in fact another syntax called DMX. You can think about it as a simplified SQL which doesn’t support JOIN, GROUP BY and more.

You can start with a simple select to show all the tables, including system tables:

An image showing the results of the query. There are 2 system tables for Product and TransactionHistory and two for the TABLE objects

The secret to finding the information is how the refresh happens. The refresh is always executed by partition. Power BI doesn’t care if you request by dataset or table, the execution will always be on the partition level.

You can query the table $system.TMSCHEMA_TABLES to get the Id of the table, but the last refresh date is only located in the table $system.TMSCHEMA_PARTITIONS . You can retrieve this information with the following query:

An image showing the refresh date and time for the Product table

The older folks may still remember what DMX means: Data Mining Extensions. It’s an old language created to be used with the data mining feature in SSAS. I confess I thought this feature was long gone, but it was only deprecated in SQL Server 2017. It’s very curious that the language originally built to query data mining models is now used to query Power BI System tables.

One more way to capture the refresh activity

Another interesting way to view the refreshes is by capturing the refresh activity using SQL Server Profiler and checking the details of the activity using a Power BI Dashboard.

A dashboard for this purpose already exists and it’s very interesting. It depends on the captured data from SQL Profiler. Because of that, it’s a tool to be used for short tests, not to analyse production activity.

You can check the details about this dashboard and step by step about how to capture the information and use the dashboard.

I executed this test as well. The image below is the dashboard generated from the captured trace after the execution of the script. As you may notice, it shows only the product table, the only one included in the refresh.

An image of the Job Tracing Report

Automating the refresh

After building and testing the script for the refresh, it’s time to automate the execution of the script. The first step is creating an automation account in Azure.

Create an automation account

If you already have an automation account or have already worked with one before, you can skip this part of the article.

Here’s how to make a small step-by-step to provision the automation account that will be used to schedule the refresh of the tables:

  1. In the Azure Portal, click Create a Resource icon

An image showing Azure Services Create a Resource

  1. In the search text box, type Automation to find the automation account

An image showing Azure Portal and searching for Automation

  1. On the Automation provisioning page, click the button Create

An image showing Automaton Create

  1. On the Create an Automation Account page, click the Create New button below the Resource Group text box.

Create a new resource group called PBIAutomationRG for the new Automation account.

An image showing the properties when creating an automation account

  1. Select the region closer to you
  2. Click Review + Create button. For this example, you don’t need to customize the additional configurations in Advance or Networking.
  3. In the review window, click the Create button

That’s it, the Automation Account is created.

Configuring the Automation Account to use ADOMD

After provisioning the Automation Account, it’s time to configure it to support the ADOMD library. This is done by managing the modules imported on the Automation Account.

There is another blog post explaining how to use AMO and ADOMD in an Automation Runbook. The post has all the details you will need. It’s interesting that AMO already has a PowerShell module for it, but ADOMD doesn’t, so this post shows both scenarios, using an existing module from the gallery or including a new module uploading a zip file containing the DLL. https://sqlitybi.com/how-to-use-amo-and-adomd-in-azure-powershell-runbooks/

Creating an Automation Runbook

Here are the steps to create and test the automation runbook. –

  1. Open your automation account.
  2. In the runbooks blade, click the button Create Runbook

An image showing the Runbooks blade of the automation account. Create a runbook

  1. Fill the runbook information. You will use a PowerShell 5.1 runbook.

An image showing the Create a runbook pages with properties filled in

  1. Click Create
  2. In the Edit PowerShell Runbook window, paste the code you built before and executed in the PowerShell ISE
  3. Replace the first line, which is loading the ADOMD module, by the following lines:

An image showing how to edit the PowerShell Runbook code

  1. Click the Save button
  2. Click the Publish button
  3. On the Runbook page, click Start button and confirm, clicking Yes on the question

Every time you execute a Runbook, a job is created. You will be automatically taken to the job window.

An image showing where to start the Runbook.

  1. On the Job window, wait until the job is completed

An image showing the status of the job

  1. Check if the refresh was successful. You can use SSMS for a precise information and look in the portal to confirm the information in the portal as well.

Scheduling the runbook execution

The runbook is created. Now it’s time to schedule it for an automatic execution

  1. On the Jobs window, on the breadcrumbs, click the name of the Runbook, returning to the Runbook window.
  2. On the schedule blade, click the Add a schedule button

An image showing the Schedules blade where you Add a Schedule

  1. On the Schedule Runbook window, click the option Link a schedule to your runbook

An image showing where to link the run book to the schedule

  1. On the Schedules window, click the Add a Schedule button

AN image showing Add a schedule

  1. Define the schedule as you wish and click the Create button

An image showing the properties of the new schedule

  1. Click the Ok button on the Schedule Runbook window

The schedule object is linked with the Runbook, but it’s stored in the automation account. This makes the same schedule available for many different runbooks, another advantage of the automation account.

Automate Power BI table refresh

You may just have discovered an entire new point of view about how to manage a Power BI data refresh. This is just the beginning. The possibilities in relation to automation only increase from this point forward.

If you liked this article, you might also like Power BI: ETL or not ETL, that’s the question

Load comments

About the author

Dennes Torres

See Profile

Dennes Torres is a Data Platform MVP and Software Architect living in Malta who loves SQL Server and software development and has more than 20 years of experience. Dennes can improve Data Platform Architectures and transform data in knowledge. He moved to Malta after more than 10 years leading devSQL PASS Chapter in Rio de Janeiro and now is a member of the leadership team of MMDPUG PASS Chapter in Malta organizing meetings, events, and webcasts about SQL Server. He is an MCT, MCSE in Data Platforms and BI, with more titles in software development. You can get in touch on his blog https://dennestorres.com or at his work https://dtowersoftware.com