Fabric: Query a SQL Endpoint from a Notebook

Let’s analyse why we would like to query an SQL Endpoint. Once we understand why, we can dig into how to make a query to a SQL Endpoint.

We use notebooks to connect directly to lakehouse. Except by the T-SQL notebook, the notebooks have a default lakehouse and work directly with it from spark. However, accessing other data object may be more complex

Why execute a Query on a SQL Endpoint from a Notebook

The reasons to query a SQL Endpoint from a notebook are not very usual, but there are many.

Most data objects have a SQL Endpoint: Data Warehouse, Fabric Database, mirrored database. If you need to access data from these objects, you will need to query the SQL Endpoint.
I wrote before about a delay in SQL Endpoints to reflect lakehouse data. You may would like to query the SQL Endpoint to analyse this.
There are some queries and features we can use through SQL Endpoints but we can’t execute directly in a lakehouse.

These are only a few examples I identified. Let’s talk in the comments about what other scenarios may create this need.

Querying the SQL Endpoint: The method to use

A SQL Endpoint uses TDS, the same connection protocol as SQL Server and other Microsoft SQL flavours. Most software development languages are capable to make a connection to a Microsoft SQL flavour.

The notebooks use powerful languages, such as pySpark and Scala. In this way, there are multiple methods to make the connection to a SQL Endpoint. For example, we can use ODBC, JDBC or many other connection options.

However, I was looking for a more “Fabric native” method. There is one. It’s a bit strange, but at the moment, is the most “Fabric native” method.

This method requires us to use libraries only available to Scala language. In this way, the notebook will need to be in Scala.

Querying the SQL Endpoint in Scala

This is the basic code to query a SQL Endpoint using Scala:

import com.microsoft.spark.fabric.tds.implicits.read.FabricSparkTDSImplicits._
import com.microsoft.spark.fabric.Constants

val df = spark.read.option(Constants.WorkspaceId, datausageWorkspaceId).option(Constants.DatabaseName, SQLEndpoint_name).synapsesql(qry.query)

These are the details about this code:

The two imports only work in Scala
The two constants contain strings with the option name we need to use for the read operation
We need to have the Workspace Id and SQL Endpoint name
The Workspace is considered the server and the SQL Endpoint is considered the database
At the end we execute the query

Interaction between pySpark and Scala

My entire environment uses pySpark, while this solution requires Scala.

In some situations, we may need to create an interaction between the pySpark environment and Scala environment.

What we can do:

The parameters cell can be in Scala and the notebook can be called by pySpark notebooks. No problem about this.
The Scala cells can call pySpark notebooks.
By “call” this means either ‘%Run’ or ‘notebook.run’

What we can’t do:

Scala cells can’t access pySpark variables, neither the opposite

Exchanging values between languages

You may need to exchange values between the two languages from time to time. It depends on each scenario.

The solution for this is to use spark configurations. In one language you can set a configuration with the the variable value and read the configuration in the other language to retrieve the value.

Example – Set the configuration in pySpark:

spark.conf.set("lakemirrorSQLEndpointId",lakemirrorSQLEndpointId)

Retrieve the configuration in Scala:

val lakemirrorSQLEndpointId=spark.conf.get("lakemirrorSQLEndpointId")

Exceptions to the basic syntax

The “synapsesql” method doesn’t accept the same syntaxes supported by a SQL Endpoint.

These syntax may be perfect and acceptable by the SQL Endpoint, but “synapsesql” will reject them. It doesn’t help the error message being completely meaningless, taking us in the wrong directions.

Another syntax option: Breaking the query in two

The queries can be broken down in two parts. This is only one of the ways to solve the problem. Let’s analyse how this syntax works

val df = spark.read.option(Constants.WorkspaceId, wksId).option(Constants.DatabaseName, EndPointName).option("prepareQuery", firstpart).synapsesql(secondpart)

This first option breaks the query into two parts. The first part is sent together the “option” method, the second part is sent using the “synapsesql” method.

The relation between one query and another is made using CTE’s. Other options would be way more expensive, such as temporary views. The solution using CTE’s is like the code below:

%%spark

import com.microsoft.spark.fabric.tds.implicits.read.FabricSparkTDSImplicits._
import com.microsoft.spark.fabric.Constants

val df = spark.read.option(Constants.WorkspaceId, "bb457c73-bf10-4f53-8933-8c409192747a")
.option(Constants.DatabaseName, "interlake")
.option("prepareQuery", "with qry as (select * from fact_sale_1y_full where CustomerKey=204)")
.synapsesql("select count(*) as total from qry")

display(df)

PrepareQuery and SynapseSQL difference

SynapseSQL method is processed by spark. Spark tries to optimize the query and make additional checks. In this way, some native syntaxes are not supported in synapsesql.

The prepareQuery, on the other hand, sends the query “as is” to the target. In this way, the query is not affected by any spark processing.

There are many syntaxes which can cause this problem and require to be broken down in CTE’s and sent in two pieces:

Common Table Expressions (CTEs)
Deeply nested queries
Window functions
Custom SQL Functions
Dynamic SQL

Summary

After analysing some reasons that would require executing a query on a SQL Endpoint from a notebook, we analysed an interesting method to do this.

This method still has limitations in relation to the language, but it’s interesting to notice how it uses Fabric libraries directly.

This is far from be the only method to execute these queries, depending how the driver used to connect to the SQL endpoint.

Register for Simple Talk

Fabric: Query a SQL Endpoint from a Notebook

Why execute a Query on a SQL Endpoint from a Notebook

Querying the SQL Endpoint: The method to use

Querying the SQL Endpoint in Scala

Interaction between pySpark and Scala

Exchanging values between languages

Exceptions to the basic syntax

Another syntax option: Breaking the query in two

PrepareQuery and SynapseSQL difference

Summary

Article tags

About the author

Dennes Torres

Dennes's contributions

Articles

Books

Top topics

Dennes's latest contributions:

Vibe Coding or Not Coding: What’s in the middle

Using Parameters with DAX in Report Builder

Techniques to query Azure SQL’s new JSON Datatype

Why execute a Query on a SQL Endpoint from a Notebook

Querying the SQL Endpoint: The method to use

Querying the SQL Endpoint in Scala

Interaction between pySpark and Scala

Exchanging values between languages

Exceptions to the basic syntax

Another syntax option: Breaking the query in two

PrepareQuery and SynapseSQL difference

Summary

Article tags

Recommended

About the author

Dennes Torres

Dennes's contributions

Articles

Books

Top topics

Dennes's latest contributions:

Vibe Coding or Not Coding: What’s in the middle

Using Parameters with DAX in Report Builder

Techniques to query Azure SQL’s new JSON Datatype