title	titleSuffix	description	author	ms.author	ms.reviewer	ms.date	ms.service	ms.subservice	ms.topic	monikerRange
Tutorial: Develop a clustering model in R	SQL Machine Learning	In this four-part tutorial series, you'll develop a model to perform clustering in R with SQL machine learning.	VanMSFT	vanto	garye, jroth	05/29/2024	sql	machine-learning	tutorial	>=sql-server-2016\|\|>=sql-server-linux-ver15\|\|=azuresqldb-mi-current

Tutorial: Develop a clustering model in R with SQL machine learning

[!INCLUDE SQL Server 2016 SQL MI]

::: moniker range=">=sql-server-ver15||>=sql-server-linux-ver15" In this four-part tutorial series, you'll use R to develop and deploy a K-Means clustering model in SQL Server Machine Learning Services or on Big Data Clusters to categorize customer data. ::: moniker-end ::: moniker range="=sql-server-2017" In this four-part tutorial series, you'll use R to develop and deploy a K-Means clustering model in SQL Server Machine Learning Services to cluster customer data. ::: moniker-end ::: moniker range="=sql-server-2016" In this four-part tutorial series, you'll use R to develop and deploy a K-Means clustering model in SQL Server R Services to cluster customer data. ::: moniker-end ::: moniker range="=azuresqldb-mi-current" In this four-part tutorial series, you'll use R to develop and deploy a K-Means clustering model in Azure SQL Managed Instance Machine Learning Services to cluster customer data. ::: moniker-end

In part one of this series, you'll set up the prerequisites for the tutorial and then restore a sample dataset to a database. In parts two and three, you'll develop some R scripts in an Azure Data Studio notebook to analyze and prepare this sample data and train a machine learning model. Then, in part four, you'll run those R scripts inside a database using stored procedures.

Clustering can be explained as organizing data into groups where members of a group are similar in some way. For this tutorial series, imagine you own a retail business. You'll use the K-Means algorithm to perform the clustering of customers in a dataset of product purchases and returns. By clustering customers, you can focus your marketing efforts more effectively by targeting specific groups. K-Means clustering is an unsupervised learning algorithm that looks for patterns in data based on similarities.

In this article, you'll learn how to:

[!div class="checklist"]

Restore a sample database

In part two, you'll learn how to prepare the data from a database to perform clustering.

In part three, you'll learn how to create and train a K-Means clustering model in R.

In part four, you'll learn how to create a stored procedure in a database that can perform clustering in R based on new data.

Prerequisites

::: moniker range=">=sql-server-ver15||>=sql-server-linux-ver15"

SQL Server Machine Learning Services with the Python language option - Follow the installation instructions in the Windows installation guide or the Linux installation guide. You can also enable Machine Learning Services on SQL Server Big Data Clusters. ::: moniker-end ::: moniker range="=sql-server-2017"
SQL Server Machine Learning Services with the R language option - Follow the installation instructions in the Windows installation guide. ::: moniker-end ::: moniker range="=azuresqldb-mi-current"
Azure SQL Managed Instance Machine Learning Services. For information, see the Azure SQL Managed Instance Machine Learning Services overview.
SQL Server Management Studio (SSMS) - Use SSMS to restore the sample database to Azure SQL Managed Instance. To download, see SQL Server Management Studio. ::: moniker-end
Azure Data Studio. You'll use a notebook in Azure Data Studio for SQL. For more information about notebooks, see How to use notebooks in Azure Data Studio.
R IDE - This tutorial uses RStudio Desktop.
RODBC - This driver is used in the R scripts you'll develop in this tutorial. If it's not already installed, install it using the R command install.packages("RODBC"). For more information on RODBC, see CRAN - Package RODBC.

Restore the sample database

The sample dataset used in this tutorial has been saved to a .bak database backup file for you to download and use. This dataset is derived from the tpcx-bb dataset provided by the Transaction Processing Performance Council (TPC).

::: moniker range=">=sql-server-ver15||>=sql-server-linux-ver15"

Note

If you are using Machine Learning Services on Big Data Clusters, see how to Restore a database into the SQL Server big data cluster master instance. ::: moniker-end

::: moniker range=">=sql-server-2017||>=sql-server-linux-ver15"

Download the file tpcxbb_1gb.bak.
Follow the directions in Restore a database from a backup file in Azure Data Studio, using these details:
- Import from the tpcxbb_1gb.bak file you downloaded.
- Name the target database tpcxbb_1gb.
You can verify that the dataset exists after you have restored the database by querying the dbo.customer table:
```
USE tpcxbb_1gb;
SELECT * FROM [dbo].[customer];
```

::: moniker-end ::: moniker range="=azuresqldb-mi-current"

Download the file tpcxbb_1gb.bak.
Follow the directions in Restore a database to a Managed Instance in SQL Server Management Studio, using these details:
- Import from the tpcxbb_1gb.bak file you downloaded.
- Name the target database tpcxbb_1gb.
You can verify that the dataset exists after you have restored the database by querying the dbo.customer table:
```
USE tpcxbb_1gb;
SELECT * FROM [dbo].[customer];
```

::: moniker-end

Clean up resources

If you're not going to continue with this tutorial, delete the tpcxbb_1gb database.

Next step

In part one of this tutorial series, you completed these steps:

Installed the prerequisites
Restored a sample database

To prepare the data for the machine learning model, follow part two of this tutorial series:

[!div class="nextstepaction"] Prepare data to perform clustering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

r-clustering-model-introduction.md

r-clustering-model-introduction.md

Tutorial: Develop a clustering model in R with SQL machine learning

Prerequisites

Restore the sample database

Clean up resources

Next step

Files

r-clustering-model-introduction.md

Latest commit

History

r-clustering-model-introduction.md

File metadata and controls

Tutorial: Develop a clustering model in R with SQL machine learning

Prerequisites

Restore the sample database

Clean up resources

Next step