- Author: Stephen Mazzei
- Organization: Dataiku
- Last Update: 2024-08-22
- This plugin contains 2 key components:
- K8S Poll-Data: This is the macro that runs all the Cloud Provider utilities and Kubernetes API to gather all the raw metrics (Required)
- Cleanse K8S Data: This is the python recipe that cleans the base data into a cleanse state (Not Required)
-
Download/Install the plugin
-
Create a new Dataiku Project (Code is written for UIF enabled or disabled)
-
Create a new scenario
- Name = "Poll K8S Data"
- Trigger = Time-based, every 5 minutes
- Steps = Execute Macro, "Poll K8S Data"
- Cluster Name
- Cluster Type
- Cloud Provider Information
- This information will vary per AWS | Azure | GCP
- Folder Connection Name (Example S3 Connection: my-bucket-value) (Local/Cloud)
- Folder Name
- Run Scenario
-
Update the new folder in the flow for partitioning
- Add 2 "Dimensions" partions
- From Recipe dropdown in flow, select "Kubernetes Monitoring"
- Select Cleanse K8S Data
- Select the Raw folder for input, and create a new folder for output
- NOTE Under the "Advanced Tab" you may need to disable "Container Configuration" depending on the DSS Setup
- Run
- Create a new scenario
- Name = "Cleanse K8S Data"
- Trigger = Time-based, every hour
- Steps:
- Build "Cleanse Folder"
- Macro -- "Clear Scenario Run Logs" -- keep only the last 2/3 days (Project creates a lot of logs)
- From either folder, you can create datasets based off the folder paths.