-
Notifications
You must be signed in to change notification settings - Fork 9
Description
There are two new amazing notebooks from Databricks which will fit in very well here. The first one is similar to our demo that already exists, and the second is a new notebook which can be used as bonus material.
NOTE: In this issue, we'll update the OLD delta-lake-walkthrough
-
Download the two notebooks here: Archive.zip
-
Import the 00-Deltalake Notebook
-
Compare our existing Delta Lake exercise (delta-lake-walkthrough) with this one and decide on a merging strategy (e.g. what content to keep and what to edit)
-
Remove all Databricks-demo specific text that doesn't pertain to our content (e.g. "a cluster has been created for you...")
-
Add our per-user workspace selector and stream helpers: https://github.com/data-derp/small-exercises/blob/master/delta-lake-walkthrough/delta-lake-walkthrough.py#L31-L150 (if it doesn't already exist)
-
Add at the top of the notebook "This notebook is adapted from the Delta Lake Demo provided by Databricks".
-
Write to python source (it might be sql source, but let's see)
-
Upload to the same dir as the in the small-exercises repo
-
Add extra explanations for questions that might come up at the bottom of this page (e.g. optimize): https://data-derp.github.io/docs/2.0/making-big-data-work/exercise-delta-lake
OLD CONTENT
Let's make sure that our Delta Lake exercise is working and up to date.
Context: this ticket is a revamp of the old one because we don't know what state the "updated" Delta Lake exercise is in so we'll first check it, then update it, and then we'll add a new Delta Lake exercise.
This is no longer relevant because the notebook is no longer at this url