Skip to content

Geospatial Input Data & Webservices

Jonathan Bloedow edited this page Feb 5, 2025 · 6 revisions

Problem

A spatial disease model needs to have easy access to (input) spatial population data. We can create and use synthetic data for certain cases but we of course want real data for real work. There are well known global population data sources. And there are sources of shapefiles for administrative boundaries. While researchers could develop their own scripts to retrieve and process population data, providing a turnkey solution accelerates their workflow and ensures consistency.

There is existing reliable Python code which does this sort of thing in emod-api (from Prashanth) and RasterTools (from Kurt).

Prototype

We wanted to prototype a new solution (with a lot of help from the AI) that would take a country code and an LGA level and return a csv of the cumulated population in each LGA. We wanted this solution to download population data and shape files from a reliable webservice (or ftp server, honestly).

The code we came up with can be found on this DeepNote app (and associate notebook): https://deepnote.com/app/idm/LASER-095fb38b-bc23-4447-ab35-79eb252c9bd3

Data Sources

It uses data.worldpop.org (for population data) and geoBoundaries.org for shapefiles. geoBoundaries has a nice RESTful webservice which makes it relatively easy to programmatically retrieve data by country and admin level (e.g., https://www.geoboundaries.org/api/current/gbOpen/CAN/ADM2/). WorldPop data seems to be available at URLs that follow a predictable, regex-able, pattern. We, of course, have no control over how long those particular URLs/services remain valid.

Dependencies

The DN notebook code can be brought into laser-core utils. The biggest problem with that, however, is that these geospatial scripts tend to have a lot of substantial dependencies. E.g., geopandas, shapely, rasterio. There is a good argument that we might not want to impose a bunch of additional environmental requirements on users for "just" this capability. That's why we consider docker containers and webservices as options.

Containers

If end users have docker installed on their machines, we can provide a docker image which has the complete environment ready-to-go and the user can do something like:

docker run -v $(pwd):/data pop_tool_image:latest tool.sh --iso NGA --adm 2

This approach assumes users have docker installed in some form (or are willing to). Such a requirement might add to the perception of complexity of the toolset.

Webservices

A web service allows users to retrieve population data using standard HTTP GET requests, making integration straightforward for browsers, scripts, and commandline tools. Propping up the population input file generator as a webservice enables browser, commandline, or programmatic retrieval of the data with zero setup.

There is a simple prototype webservice running here:

http://ipadvapp06.linux.idm.ctr:8080/popcsv

Hitting that in a browser or on the commandline with wget, gets a response like:

Please provide iso=XXX and adm=[0..5].

In reality, it would provide a semi-formal schema in json format. E.g.,

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Population Input File Service",
  "description": "A web service that provides population input files for Local Government Areas (LGAs) based on a country ISO3 code and administrative level.",
  "type": "object",
  "properties": {
    "iso": {
      "type": "string",
      "description": "Three-letter ISO 3166-1 country code (e.g., 'NGA' for Nigeria).",
      "pattern": "^[A-Z]{3}$",
      "examples": ["NGA", "KEN", "USA"]
    },
    "adm": {
      "type": "integer",
      "description": "Administrative level (0 for country, 1 for states/provinces, 2 for LGAs).",
      "enum": [0, 1, 2],
      "examples": [0, 1, 2]
    }
  },
  "required": ["iso", "adm"],
  "examples": {
    "iso": "NGA",
    "adm": 2
  }
}

This is perfectly sufficient for a user to give to ChatGPT to prompt: "Give me a python script which can use this webservice interactively. ".

E.g.,

#!/usr/bin/env python3
import requests
import zipfile
import io
import json
import os

# Base URL of the web service
BASE_URL = "http://ipadvapp06.linux.idm.ctr:8080"

def fetch_schema():
    """Fetches the API schema to guide user input."""
    response = requests.get(BASE_URL + "/")
    if response.status_code == 200:
        return response.json()
    else:
        print("[ERROR] Failed to fetch API schema. Please check if the server is running.")
        exit(1)

def get_user_input(schema):
    """Prompts the user for ISO, admin level, and visualization option."""
    print("\n=== Population Data Request ===")
    
    iso_code = input("Enter 3-letter ISO country code (e.g., KEN, NGA, ETH): ").strip().upper()
    
    valid_admin_levels = list(range(4))  # API says valid levels are 0-3
    while True:
        try:
            adm_level = int(input(f"Enter administrative level {valid_admin_levels}: ").strip())
            if adm_level in valid_admin_levels:
                break
            else:
                print(f"[ERROR] Invalid admin level. Choose from {valid_admin_levels}.")
        except ValueError:
            print("[ERROR] Please enter a valid integer.")

    png_option = input("Include visualization (Y/N)? ").strip().lower()
    png_flag = "1" if png_option == "y" else "0"

    return iso_code, adm_level, png_flag

def fetch_population_data(iso, adm, png):
    """Fetches population data and processes response."""
    params = {"iso": iso.upper(), "adm": adm, "png": png}
    print(f"[INFO] Sending request to {BASE_URL}/popcsv with parameters: {params}")

    response = requests.get(BASE_URL + "/popcsv", params=params, stream=True)

    if response.status_code == 200:
        content_type = response.headers.get("Content-Type", "")

        if "application/zip" in content_type:
            # If ZIP file is returned
            zip_buffer = io.BytesIO(response.content)
            with zipfile.ZipFile(zip_buffer, 'r') as zip_ref:
                zip_ref.extractall("downloads")
                print(f"[SUCCESS] Files saved in 'downloads/' directory.")
                for file in zip_ref.namelist():
                    print(f" - {file}")

                # If PNG exists, display it
                if "population_map.png" in zip_ref.namelist():
                    try:
                        from PIL import Image
                        img_path = os.path.join("downloads", "population_map.png")
                        img = Image.open(img_path)
                        img.show()  # Opens the image using the default viewer
                    except ImportError:
                        print("[INFO] Install PIL (Pillow) to view images.")
        else:
            # If CSV file is returned
            csv_filename = f"downloads/{iso.lower()}_adm{adm}_population.csv"
            with open(csv_filename, "wb") as f:
                f.write(response.content)
            print(f"[SUCCESS] CSV file saved as '{csv_filename}'")
    else:
        print(f"[ERROR] Failed to fetch data. Status: {response.status_code}")
        print(response.text)

def main():
    """Main loop for user interaction."""
    schema = fetch_schema()

    while True:
        iso, adm, png = get_user_input(schema)
        fetch_population_data(iso, adm, png)

        repeat = input("\nWould you like to make another request? (Y/N): ").strip().lower()
        if repeat != "y":
            print("Exiting.")
            break

if __name__ == "__main__":
    main()

Caveats

Some caveats on this very prototype webservice:

  • You have to be on the GF VPN (and even then you might have issues if trying from home).
  • It just handles one request at a time (it's just a single instance).
  • There is very little error handling.
  • It's pretty slow handling non-cached requests.
  • The WS doesn't return the nice visualization you can see in the DeepNote notebook. It wouldn't be that much work to add that as an optional returned asset.

WebServices at IDM

There is still no great go-to solution for webservices at scale in the organization, especially not publicly facing. That's been a long-term work in progress.

Clone this wiki locally