Skip to content
This repository has been archived by the owner on Feb 17, 2025. It is now read-only.

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

StreamSets Logo

Tx Scala UDF

Tx Scala UDF

Important: These instructions assume you have access to StreamSets Transformer

Here is a link to a short video on using this pipeline template: Video Link

OVERVIEW

This pipeline demonstrates how to create, register, and use a User-Defined Function in Scala using StreamSets Transformer.

The source data for this pipeline is included in the Dev Raw Data Source as an example. Typically, you would replace these with your actual source data (JDBC/Files/etc...). This template writes data to a file on the local file system, but you would typically replace this with your actual destination.

Disclaimer: This pipeline is meant to serve as a template for creating, registering and using a User-Defined Function in Scala

USING THE TEMPLATE

NOTE: Templates are supported in StreamSets Control Hub. If you do not have Control Hub, you can import the template pipeline in Data Collector but will need to do that each time you want to use the template.

PIPELINE

Pipeline

Pipeline Description with links to documentation

Stage Description
Dev Raw Data Source Generates records based on user-supplied data
Create UDFs Creates a small example function and registers it with SparkSQL as a column function
Use UDF Leverages created UDF as a SparkSQL Expression Function
Write udf Writes data to a local file system

STEP-BY-STEP

Step 1: Download the pipeline

Click Here to download the pipeline and save it to your drive.

Step 2: Import the pipeline

Click the down arrow next to the "Create New Pipeline" and select "Import Pipeline From Archive".

Step 2

Click "Browse" and locate the pipeline file you just downloaded, click "OK", then click "Import"

Step 2a

Step 3: Configure the parameters

Click on the pipeline you just imported to open it and click on the "Parameters" tab and fill in the appropriate information for your environment.

Important: For this pipeline, you only need to specify the output directory for the file. This is on the local file system where Transformer is installed. Make sure the directory is created and proper permissions are set so that the transformer user can create files. By default, the directory /data/udf is used. You can change it to anything you want.

Step 3

The following parameters are set up for this pipeline:

destination_directory Path to the directory for the output files.

Use the following format:

/<directory>

Step 4: Run the pipeline

Click the "START" button to run the pipeline.

Step 4

Step 4a