Geospatial Digital Special Collections - Basic Use

kube.ipynb jupyter notebook

The python scripts in the kube.ipynb jupyter notebook in the tools repository provides basic functionality to spin up a data set into a postGIS kubernetes container and thus make several services available. Specifically:

connections to the database (for the ArcGIS data store and other applications that consume postgres connections)
a set of dynamic vector tiles
a limited set of direct apis to query the data
an indexed landing page for the dataset
[coming soon] OGC mapserver and feature server

data ingest process

A reference for the basic flow for the extract-transform-load (ETL) process is in the gdsc ingest process diagram. This diagram is best opened in the online application https://app.diagrams.net/.

other functionality

Other functionality in the tool set includes:

spin down and delete the pod with a dataset
see all running pods
review metadata
update all json metadata on disk

typical workflow

This is a basic set of steps for a common workflow with GDSC:

start docker desktop
open a bash terminal (mac) or powershell (windows) and navigate to the kubernetes directory.
type postgis.sh -l (mac) or postgis.ps1 -l (windows)
once the system comes up, navigate to the tools directory
type git pull origin main (to get the recent changes from the repository)
type jupyter notebook and wait for the notebook to start in a web browser
in the web browser navigate to the jupyter/kube.ipynb notebook and double click

At this point you are ready to do work and use the tools. When you are done follow these steps to leave a clean workspace.

in the jupyter notebook clear all output from the cells. This is in the Edit menu under Clear Outputs of All Cells.
wait for the notebook to autosave (usually about one minute)
from the file menu choose Close and Shut Down Notebook, this will close the browser tab with the notebook
find the browser tab with the jupyter file system interface and select Shutdown from the File menu.
return to the bash terminal (mac) or powershell (windows) and navigate to the kubernetes directory
type cleanup.sh -l (mac) or cleanup.ps1 -l (windows)
once the script has finished quit docker desktop

debugging the jupyter notebook on localhost

NOTE: none of the notes below apply to the kmaster.idsc.miami.edu control plane.

As most of the code is still a work in progress the tools will sometimes fail. Most often the fail will take place when you run the main() function to ETL data. What stage the ETL process fails will determine where to look for the failure.

A good first check for general level failure is to open a shel and type kubectl get pods -n gdsc. This will list all running pods. If you do not see something that ends similar to the below, simply run the cleanup script in the kubernetes directory and start again ...

If you have run the main() function and it failed at

Running scripts for ...

This means the ETL failed on the osgeo container in the postgis_osgeo pod.

If you have run the main() function and it failed at

Running SQL scripts for ...

This means the ETL failed on the postgis container in the postgis_osgeo pod.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gdsc_use.md

gdsc_use.md

Geospatial Digital Special Collections - Basic Use

kube.ipynb jupyter notebook

data ingest process

other functionality

typical workflow

debugging the jupyter notebook on localhost

Files

gdsc_use.md

Latest commit

History

gdsc_use.md

File metadata and controls

Geospatial Digital Special Collections - Basic Use

kube.ipynb jupyter notebook

data ingest process

other functionality

typical workflow

debugging the jupyter notebook on localhost