-
Notifications
You must be signed in to change notification settings - Fork 121
Configuring Power BI Direct Query to Azure Cosmos DB via Apache Spark (HDI)
A powerfully fun way to visualize your data in Azure Cosmos DB is to use Power BI. While there is an ODBC Driver (refer to Connect to Azure Cosmos DB using BI analytics tools with the ODBC driver), this method requires you to download all of the data from Azure Cosmos DB into Power BI.
To workaround this issue, one technique is to use the azure-cosmosdb-spark
connector which allows you to use Apache Spark as the bridge between Power BI and Azure Cosmos DB. Power BI has direct query capabilities to Apache Spark and with the azure-cosmosdb-spark
connector, you can create direct connectivity from Power BI to Azure Cosmos DB.
Note, these are alpha working instructions and we will over time simplify how to do this so it will be easier for you to configure this.
You will need the following components
- Power BI Desktop,
- an Apache Spark service such as Azure HDInsight Spark, and
- an Azure Cosmos DB subscription.
The key configuration here is the ability to copy the azure-cosmosdb-spark
JARs to the worker nodes on your HDI cluster.
To get the jars, please build the code using mvn clean package
or you can download them from the releases folder. As of this writing, the latest version of the JARS can be found in azure-cosmosdb-spark-0.0.3_2.0.2_2.11.
Grab these JARs and be prepared to upload them to your HDI cluster worker nodes.
The goal here is to copy the azure-cosmosdb-spark
JARS to the `/usr/hdp/current/spark2-client/jars on your worker and head nodes of your cluster.
To get this information, you will need to log into your Azure HDI cluster and copy down a list of your head and worker nodes. To do this, first you will log into the Azure Portal and connect to your HDI cluster such as the image below.
You will need to click on Cluster Dashboard.
From here, you click on HDInsight Cluster Dashboard.
Then click on Hosts and you see the list of head nodes (prefix of hn) and worker nodes (prefix of wn).