Skip to content

Add datascience notebook and github actions #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 61 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
e8fa194
Add datascience notebook and github actions
AnchorArray Aug 8, 2022
4d1b838
add action on PR
AnchorArray Aug 8, 2022
f4796aa
add action on PR
AnchorArray Aug 8, 2022
38cc6b7
Update .github/workflows/main.yaml
AnchorArray Oct 19, 2022
1d8427b
Update .github/workflows/main.yaml
AnchorArray Oct 19, 2022
4486d82
Update .github/workflows/main.yaml
AnchorArray Oct 19, 2022
ba915e7
Update .github/workflows/main.yaml
AnchorArray Oct 19, 2022
db1e20d
Update .github/workflows/main.yaml
AnchorArray Oct 19, 2022
1ed41a9
address comments (and test without qemu)
AnchorArray Oct 19, 2022
b0dc6d9
Update .github/workflows/main.yaml
AnchorArray Oct 19, 2022
05d6114
address comment suggestion
AnchorArray Nov 2, 2022
2dc8941
debug
AnchorArray Nov 2, 2022
5586d3d
remove pinned version from buildx for testing
AnchorArray Nov 3, 2022
5e7c361
self-hosted runner
AnchorArray Nov 3, 2022
334d653
Update .github/workflows/main.yaml
AnchorArray Nov 3, 2022
8a6fc03
Update .github/workflows/main.yaml
AnchorArray Nov 3, 2022
d1a0b61
Update .github/workflows/main.yaml
AnchorArray Nov 3, 2022
5e8fcf2
Update kernels/datascience-notebook/Dockerfile
AnchorArray Nov 3, 2022
16e87f3
Update kernels/datascience-notebook/Dockerfile
AnchorArray Nov 3, 2022
5ee2de1
Empty-Commit
AnchorArray Nov 4, 2022
aa23a2a
Merge branch 'add-ds-notebook' of https://github.com/noteable-io/kern…
AnchorArray Nov 4, 2022
7b636c2
update
AnchorArray Nov 4, 2022
37b0044
revert to v2, there is no v3
AnchorArray Nov 4, 2022
0806f95
triggering build for refresh
AnchorArray Jan 12, 2023
cc011de
Revert "add action on PR"
AnchorArray Jan 27, 2023
e83ae07
revert to working
AnchorArray Jan 27, 2023
4978b8c
change cache
AnchorArray Jan 27, 2023
054c258
update versions
AnchorArray Jan 27, 2023
8322242
Update to latest version of notebook image with new features
AnchorArray Jan 27, 2023
929c8e6
rename file
AnchorArray Jan 27, 2023
edef0f3
adding missing envs
AnchorArray Jan 27, 2023
4b6ed79
debug
AnchorArray Jan 27, 2023
958ba3d
revert
AnchorArray Jan 27, 2023
963bdcf
Merge branch 'main' into add-ds-notebook
AnchorArray Mar 9, 2023
3d5f98f
Merge branch 'main' of https://github.com/noteable-io/kernels into ad…
AnchorArray Mar 9, 2023
953c2b0
add 3.7
AnchorArray Mar 9, 2023
e05070e
Merge branch 'add-ds-notebook' of https://github.com/noteable-io/kern…
AnchorArray Mar 9, 2023
c185dab
change to amd...
AnchorArray Mar 9, 2023
0ce6239
update packages
AnchorArray Mar 9, 2023
ff22605
Add environment specific requirements
AnchorArray Mar 9, 2023
8e651a2
missing change
AnchorArray Mar 9, 2023
acd10bc
remove reusable workflow for now
AnchorArray Mar 9, 2023
47860e4
dockerfile updates
AnchorArray Mar 9, 2023
2230b39
build arg not propagating
AnchorArray Mar 9, 2023
6f1b36f
bump versions
AnchorArray Mar 9, 2023
dcd48ed
debug
AnchorArray Mar 9, 2023
c1d5129
debug
AnchorArray Mar 9, 2023
577ee69
debug
AnchorArray Mar 9, 2023
af43735
debug
AnchorArray Mar 9, 2023
930f5a6
debug
AnchorArray Mar 9, 2023
2744114
debug
AnchorArray Mar 9, 2023
f4bce28
debug
AnchorArray Mar 9, 2023
0035088
debug
AnchorArray Mar 9, 2023
1684f0c
debug
AnchorArray Mar 9, 2023
f69da75
debug
AnchorArray Mar 10, 2023
6a653ed
debug
AnchorArray Mar 10, 2023
8117176
debug
AnchorArray Mar 10, 2023
4cff673
debug
AnchorArray Mar 10, 2023
22d87d3
debug
AnchorArray Mar 10, 2023
b565344
debug
AnchorArray Mar 10, 2023
caaa4e2
add 3.10.9
AnchorArray Mar 10, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
151 changes: 151 additions & 0 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
name: Build kernel images

on:
push:
branches:
- main

pull_request:
branches:
- main

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}

jobs:

build-python-images:
runs-on: kubernetes-organization-runner
strategy:
matrix:
# version: ["3.8.8", "3.9.13", "3.10.5"] 3.10.5 fails with dependency conflicts
# We may need to have separate requirements.txt for each version, or replace
# dependencies on the fly
version: ["3.8.13", "3.9.13"]
directory: ["datascience-notebook"]
# The datascience-notebook base image does not support ARM
# We would need to build and maintain our own base image
# architecture: ["arm", "amd"]
steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
with:
endpoint: github-action

- name: Login to GitHub Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v3
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=raw,value=latest,enable=${{ endsWith(github.ref, github.event.repository.default_branch) }}
type=schedule
type=ref,event=branch
type=ref,event=tag
type=ref,event=pr
labels: |
org.opencontainers.image.source=${{ github.server_url }}/${{ github.repository }}

- name: Get current time
uses: josStorer/get-current-time@84e5c63cf4cc28dc797be7bb0bfc0171b8c468ce
id: current-time

# - name: Create context
# run: |
# docker context create github-action

# - name: Cache Docker layers
# uses: actions/cache@v3
# id: docker-cache
# with:
# path: "/tmp/.buildx-cache"
# key: "${{ runner.os }}-${{env.RUNNER_ARCH}}-buildx-${{ matrix.directory }}-${{ matrix.version }}"
# restore-keys: "${{ runner.os }}-${{env.RUNNER_ARCH}}-buildx-${{ matrix.directory }}-"

# - name: Build arguments
# id: build-args
# run: |
# # Image Name
# container_registry=ghcr.io/${{ github.repository_owner }}
# image_name=kernel-${{ matrix.directory }}
# full_image_name="${container_registry}/${image_name}"

# # Image Tags
# image_sha_tag="${GITHUB_SHA:0:12}" # first 12 numbers of the SHA
# image_version_tag="python-$(version=${{ matrix.version }} && echo ${version%.*} )" # removes patch version

# full_image_name_tagged=''

# if [ "${GITHUB_EVENT_NAME}" = 'push' ]; then
# full_image_name_tagged="${full_image_name}:${image_version_tag}"
# elif [ "${GITHUB_EVENT_NAME}" = 'pull_request' ]; then
# full_image_name_tagged="${full_image_name}:${image_version_tag}-${image_sha_tag}"
# fi

# echo "::set-output name=FULL_IMAGE_NAME::${full_image_name}"
# echo "::set-output name=FULL_IMAGE_NAME_TAGGED::${full_image_name_tagged}"

# echo "::set-output name=BUILD_URL::https://github.com/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}"
# echo "::set-output name=BUILD_TIMESTAMP::$(date --utc --iso-8601=seconds)"

# echo "full_image_name: $full_image_name"
# echo "image_version_tag: $image_version_tag"
# echo "image_sha_tag: $image_sha_tag"
# echo "full_image_name_tagged: $full_image_name_tagged"

# - name: Build image
# env:
# DOCKER_CONTENT_TRUST: 1
# DOCKER_CONTEXT: github-action
# run: |
# (
# cd ${GITHUB_WORKSPACE}/kernels/${{ matrix.directory }}

# docker buildx build \
# --pull \
# --output 'type=docker' \
# --platform=linux/arm64 \
# --progress plain \
# --cache-from 'type=local,src=/tmp/.buildx-cache' \
# --cache-to 'type=local,dest=/tmp/.buildx-cache' \
# --tag '${{ steps.build-args.outputs.FULL_IMAGE_NAME_TAGGED }}' \
# --build-arg PYTHON_VERSION=${{ matrix.version }} \
# --build-arg 'NBL_ARG_BUILD_TIMESTAMP=${{ steps.build-args.outputs.BUILD_TIMESTAMP }}' \
# --build-arg 'NBL_ARG_BUILD_URL=${{ steps.build-args.outputs.BUILD_URL }}' \
# --build-arg 'NBL_ARG_REVISION=${{ github.sha }}' \
# --build-arg 'NBL_ARG_VERSION=${{ github.ref }}' \
# .
# )

# - name: Publish image
# run: |
# docker push --all-tags ${{ steps.build-args.outputs.FULL_IMAGE_NAME }}

- name: Build and push Docker image
uses: docker/build-push-action@v2
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
secrets: |
"expel_artifactory_connection_url_file=${{ secrets.EXPEL_ARTIFACTORY_CONNECTION_URL }}"
"git-credentials=${{ secrets.GIT_CREDENTIALS }}"
build-args: |
"NBL_ARG_BUILD_TIMESTAMP=${{ steps.current-time.outputs.formattedTime }}"
"NBL_ARG_REVISION=${{ github.sha }}"
"NBL_ARG_BUILD_URL=${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
"NBL_ARG_VERSION=${{ github.ref }}"
3 changes: 3 additions & 0 deletions kernels/datascience-notebook/.pythonrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
import pandas as pd

import dx
116 changes: 116 additions & 0 deletions kernels/datascience-notebook/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# syntax = docker/dockerfile:1.4.1
ARG BASE_IMAGE=jupyter/datascience-notebook
ARG PYTHON_VERSION=3.9.13
# hadolint ignore=DL3006
FROM ${BASE_IMAGE}:python-${PYTHON_VERSION}

USER root

# Set up log file for magics
RUN touch /var/log/noteable_magics.log && \
chown 4004:4004 /var/log/noteable_magics.log

# When image is run, run the code with the environment
# activated:
SHELL ["/bin/bash", "-c"]

WORKDIR /tmp

# hadolint ignore=DL3008,DL3015
RUN apt-get update && \
apt-get install -y jq procps git unixodbc-dev g++ \
&& rm -rf /var/lib/apt/lists/*

ENV TINI_VERSION=v0.19.0
RUN TINI_BINARY=$(if [ "$(uname -m)" = "aarch64" ]; then echo "tini-arm64"; else echo "tini"; fi); echo "${TINI_BINARY}" \
&& wget -q -O /usr/local/bin/tini "https://github.com/krallin/tini/releases/download/${TINI_VERSION}/${TINI_BINARY}" \
&& chmod +x /usr/local/bin/tini

ENV NB_USER="noteable" \
NB_UID=4004 \
NB_GID=4004

# Create the default unprivileged user
RUN groupadd --gid 4004 noteable && \
useradd --uid 4004 --shell /bin/false --create-home --no-log-init --gid noteable noteable && \
chown --recursive noteable:noteable /home/noteable

RUN mkdir /etc/ipython && chown noteable:noteable /etc/ipython
RUN mkdir -p /etc/noteable && chown noteable:noteable /etc/noteable

RUN chown noteable:noteable "${JULIA_PKGDIR}" && \
chown noteable:noteable "${CONDA_DIR}" && \
fix-permissions "${JULIA_PKGDIR}" && \
fix-permissions "${CONDA_DIR}"

# Run non-privileged user
USER noteable

ENV PATH="/home/noteable/.local/bin:${PATH}" \
HOME="/home/noteable" \
XDG_CACHE_HOME="/home/noteable/.cache/" \
GOOGLE_APPLICATION_CREDENTIALS="/vault/secrets/gcp-credentials"

# hadolint ignore=DL3045
COPY environment.txt ./

# hadolint ignore=DL3045
COPY requirements.txt ./

# hadolint ignore=SC2034
RUN conda install --file environment.txt

# hadolint ignore=DL3045
COPY requirements.txt ./

# hadolint ignore=SC1008,SC2155,DL3042,SC2102
RUN pip install -I --quiet --no-cache-dir "git+https://github.com/noteable-io/noteable-notebook-magics.git@main" && \
pip install -I --quiet --no-cache-dir -r requirements.txt

# Copy over any python commands that need to run on startup
# that aren't covered by IPython extensions
COPY .pythonrc /home/noteable/.pythonrc

# Enable the widgets nbextension
# hadolint ignore=SC1008
RUN jupyter nbextension enable --py --sys-prefix widgetsnbextension

# Smoke test to ensure packages were installed properly
# hadolint ignore=SC1008
RUN python -c "import noteable_magics"

RUN git config --global user.name "Noteable Kernel" && \
git config --global user.email "[email protected]"

# https://ipython.readthedocs.io/en/stable/config/intro.html#systemwide-configuration
COPY ipython_config.py /etc/ipython

# Set standard working directory for noteable project
WORKDIR /etc/noteable/project

# Add the entrypoint script to the $PATH
COPY run.sh /usr/local/bin
COPY secrets_helper.py /tmp/secrets_helper.py

EXPOSE 50001-50005

# Use tini to manage passing signals to the child kernel process
# -g will ensure signals are passed to the entire child process *group*,
# not just the immediate child process (bash)
# https://github.com/krallin/tini#process-group-killing
ENTRYPOINT ["tini", "-g", "--"]
CMD ["run.sh"]

# Labels
ARG NBL_ARG_BUILD_TIMESTAMP="undefined"
ARG NBL_ARG_REVISION="undefined"
ARG NBL_ARG_PYTHON_VERSION="3.9.6"
ARG NBL_ARG_BUILD_URL="undefined"
ARG NBL_ARG_VERSION="undefined"
LABEL org.opencontainers.image.created="${NBL_ARG_BUILD_TIMESTAMP}" \
org.opencontainers.image.revision="${NBL_ARG_REVISION}" \
org.opencontainers.image.source="https://github.com/noteable-io/polymorph" \
org.opencontainers.image.title="noteable-python-${NBL_ARG_PYTHON_VERSION}" \
org.opencontainers.image.url="${NBL_ARG_BUILD_URL}" \
org.opencontainers.image.vendor="Noteable" \
org.opencontainers.image.version="${NBL_ARG_VERSION}"
20 changes: 20 additions & 0 deletions kernels/datascience-notebook/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Multitenant Python Image

Entrypoint is used to implement signal-based interrupts, since `ipykernel` does not support message-based interupts.

## Building Locally
You'll need to provide a git credential string located at `${HOME}/.git-credentials`:

```shell
echo "${GITHUB_USER_NAME}:${GITHUB_PERSONAL_ACCESS_TOKEN}" > ${HOME}/.git-credentials
```

The [personal access token](https://github.com/settings/tokens) needs to have
the `read:packages, repo` scope (and make sure to enable SSO on it).

```shell
# Optional step to help you auto-load your built docker container into minikube for use with Gate
eval $(minikube docker-env)

DOCKER_BUILDKIT=1 docker build --secret "id=git-credentials,src=${HOME}/.git-credentials" -t local/noteable-python:latest .
```
7 changes: 7 additions & 0 deletions kernels/datascience-notebook/environment.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
ipykernel=5.5.*
ipython=8.0.*
vdom=0.6
papermill=2.2.*
ipywidgets=7.6.*
plotly=4.14.3
geopandas=0.11.0
12 changes: 12 additions & 0 deletions kernels/datascience-notebook/ipython_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
c.InteractiveShellApp.extensions = [
"sql",
"noteable_magics",
]

c.SqlMagic.feedback = False
c.SqlMagic.autopandas = True
c.NTBLMagic.project_dir = "/etc/noteable/project"
c.NoteableDataLoaderMagic.return_head = False
c.IPythonKernel._execute_sleep=0.15
# 10 minutes to support large files
c.NTBLMagic.planar_ally_default_timeout_seconds = 600
12 changes: 12 additions & 0 deletions kernels/datascience-notebook/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
dx==1.1.2
# Datasources-related packages here on down, alphabetized please for easy cut/paste across files and repos.
google-cloud-bigquery-storage==2.6.3
psycopg2-binary==2.9.3
pyodbc==4.0.32
redshift_connector==2.0.907
snowflake_sqlalchemy==1.3.4
sqlalchemy-bigquery==1.3.0
sqlalchemy-databricks==0.2.0
sqlalchemy-redshift==0.8.9
trino[sqlalchemy]==0.313.0
astroid==2.12.2
50 changes: 50 additions & 0 deletions kernels/datascience-notebook/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/usr/bin/env bash
set -o pipefail
set -o nounset
set -o errexit

echo "Local time: $(date)"

set -x

connection_file=/tmp/connection_file.json

cp /etc/noteable/connections/connection_file.json ${connection_file}

kernel_name=$(jq -r .kernel_name /tmp/connection_file.json)

# Inject Secrets into environment (see script docstring for more info)
# set +x to avoid echoing the Secrets in plaintext to logs
set +x
echo "Injecting Secrets into environment, echoing is turned off"
eval "$(python /tmp/secrets_helper.py)"
echo "Done injecting Secrets, turning echoing back on"
set -x

case $kernel_name in

python | python3)
echo "Starting Python kernel"
# https://docs.python.org/3/using/cmdline.html#envvar-PYTHONSTARTUP
export PYTHONSTARTUP=~/.pythonrc
python -m ipykernel_launcher -f ${connection_file} --debug
;;

ir)
echo "Starting R kernel"
R --slave -e "IRkernel::main()" --args ${connection_file}
;;

julia | julia-1.6)
echo "Starting Julia kernel"
# project path necessary to keep julia form using its defaults
julia -i --color=yes --project=/etc/noteable/project /opt/julia/packages/IJulia/e8kqU/src/kernel.jl ${connection_file}
;;

*)
echo "Unrecognized '$kernel_name' kernel, falling back to Python"
# https://docs.python.org/3/using/cmdline.html#envvar-PYTHONSTARTUP
export PYTHONSTARTUP=~/.pythonrc
python -m ipykernel_launcher -f ${connection_file} --debug
;;
esac
Loading