~90 min read · updated 2026-05-15

The toolchain

Stand up JupyterHub with the DockerSpawner so every student gets a containerized GPU-aware notebook, plus self-hosted Gitea, uv for environments, and pre-commit.

By the end of this module you will have:

  • JupyterHub running on the GPU server, with each student getting their own Docker-isolated notebook container on login.
  • GPU pass-through wired into the spawner so a student can request a GPU at notebook-start time.
  • Gitea running as the source of truth for every repo in the course — the laptop’s GitHub is a mirror at best.
  • uv for Python environment management inside notebook containers and on the shell.
  • Pre-commit wired into a per-student template repo, enforcing ruff, mypy, and trailing-whitespace on every commit.
  • A second ADR documenting why DockerSpawner over the simpler alternatives.

This is a longer module than module 01. The toolchain is what every later module assumes — taking time here saves it twice over later.

Why JupyterHub, and why with the DockerSpawner

There are three popular ways to give a cohort of students notebook access:

OptionWhat it isWhen to pick it
Shared user, one JupyterEveryone SSHes in as the same user and runs jupyter labNever for ≥2 students
TLJH (The Littlest JupyterHub)JupyterHub with the SystemdSpawner — per-user processes, shared kernelsSmall cohorts, no GPU isolation, simplest install
JupyterHub + DockerSpawnerPer-user containers, optional GPU pass-throughWhat we want — containerized isolation and per-user GPU bookings

We pick the DockerSpawner. The two reasons:

  • Environment isolation. A student who pip installs something weird only breaks their own container, never the platform.
  • Per-notebook GPU bookings. A student can spawn a “GPU notebook” image that gets --gpus all, or a “CPU-only” image that doesn’t. This is the foundation of fair GPU sharing without Slurm getting involved for every quick experiment.

Step 1 — Install JupyterHub

Run JupyterHub itself in a container, alongside the per-user containers it spawns. This keeps the host clean and makes upgrades a docker compose pull && up -d.

Create /opt/jupyterhub/docker-compose.yml:

services:
  jupyterhub:
    image: quay.io/jupyterhub/jupyterhub:5.1
    container_name: jupyterhub
    restart: unless-stopped
    network_mode: host
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /opt/jupyterhub:/srv/jupyterhub
      - /home:/home
      - /srv/shared:/srv/shared
    command: >
      bash -c "pip install dockerspawner==13.0.0 jupyterhub-nativeauthenticator
      && jupyterhub --config=/srv/jupyterhub/jupyterhub_config.py"

The hub container has the host’s Docker socket bind-mounted — that’s how it spawns per-user containers as siblings on the same Docker daemon. It also mounts /home and /srv/shared so the spawned notebook containers see the same per-user homes and shared assets you set up in module 01.

Save the spawner config as /opt/jupyterhub/jupyterhub_config.py:

import os

c = get_config()                                             # noqa: F821

c.JupyterHub.authenticator_class = "nativeauthenticator.NativeAuthenticator"
c.Authenticator.admin_users = {"ze"}                          # the instructor
c.Authenticator.allow_all = False

c.JupyterHub.spawner_class = "dockerspawner.DockerSpawner"

c.DockerSpawner.image = "quay.io/jupyter/scipy-notebook:2025-05-12"
c.DockerSpawner.network_name = "host"
c.DockerSpawner.remove = True
c.DockerSpawner.notebook_dir = "/home/jovyan/work"
c.DockerSpawner.volumes = {
    "/home/{username}": "/home/jovyan/work",
    "/srv/shared":     {"bind": "/srv/shared", "mode": "ro"},
    "/srv/shared/scratch": "/srv/shared/scratch",
}

# Two profile choices the student picks at spawn time
c.DockerSpawner.allowed_images = {
    "CPU notebook":       "quay.io/jupyter/scipy-notebook:2025-05-12",
    "GPU notebook (CUDA)": "quay.io/jupyter/pytorch-notebook:cuda12-2025-05-12",
}

def gpu_for_image(spawner):
    if "pytorch" in spawner.user_options.get("image", ""):
        spawner.extra_host_config = {"runtime": "nvidia",
                                     "device_requests": [{
                                         "Driver": "nvidia",
                                         "Count": -1,
                                         "Capabilities": [["gpu"]],
                                     }]}
c.Spawner.pre_spawn_hook = gpu_for_image

Start it:

cd /opt/jupyterhub
sudo docker compose up -d
sudo docker logs -f jupyterhub

JupyterHub is now serving on port 8000 of the GPU server. Through the jump host’s reverse proxy (from module 01’s ADR), students reach it at https://hub.example.com.

Step 2 — Confirm GPU pass-through

Have one student log in, pick “GPU notebook (CUDA)” at the spawner page, and run in a notebook cell:

import torch
print(torch.cuda.device_count(), [torch.cuda.get_device_name(i) for i in range(torch.cuda.device_count())])

Expect 2 ['NVIDIA L40S', 'NVIDIA L40S']. If you get 0, three things to check in order: the spawner image actually has CUDA wheels (pytorch-notebook:cuda12-* does, the default scipy-notebook does not), the host Docker daemon has the NVIDIA runtime (docker info | grep -i nvidia), and the device_requests block landed in the spawned container (docker inspect <container> | grep -i devicerequests).

Important fair-use note. With this config, every student who picks the GPU image gets --gpus all. For ≤6 students that’s fine if you have a working convention (“don’t kick off a 4-hour run during a live session”), but for serious multi-user GPU work, module 04’s Slurm queue is the durable answer. JupyterHub GPU pass-through is for quick experiments and small training; long jobs go through Slurm.

Step 3 — Self-hosted git with Gitea

GitHub is fine but every repo round-trips through the public internet, every secret has to be scanned, and you cannot kick a build off from a private Postgres trigger. Gitea fixes all three for the cost of one small container.

/opt/gitea/docker-compose.yml:

services:
  gitea:
    image: gitea/gitea:1.22
    container_name: gitea
    restart: unless-stopped
    environment:
      - USER_UID=1000
      - USER_GID=1000
      - GITEA__database__DB_TYPE=postgres
      - GITEA__database__HOST=postgres:5432
      - GITEA__database__NAME=gitea
      - GITEA__database__USER=gitea
      - GITEA__database__PASSWD=__gitea_db_password__
    ports:
      - "3000:3000"
      - "2222:22"
    volumes:
      - /var/lib/gitea:/data
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    depends_on:
      - postgres

  postgres:
    image: postgres:16
    container_name: gitea-postgres
    restart: unless-stopped
    environment:
      - POSTGRES_USER=gitea
      - POSTGRES_PASSWORD=__gitea_db_password__
      - POSTGRES_DB=gitea
    volumes:
      - /var/lib/gitea-postgres:/var/lib/postgresql/data

Bring it up and open the web UI at http://<gpu-server>:3000. The first user who registers becomes the admin — make that you, the instructor. Then under Site Administration → Users, invite each student by email (or gitea admin user create on the CLI).

Two policies to set on day one, before students start pushing:

  • Default branch protection on main: require PR review, disallow force-push, require status checks. Repository Settings → Branches.
  • Token scopes: scope each personal access token to a single repo where possible. Gitea supports this; it’s the difference between “one leaked token = one repo blast radius” and “one leaked token = your whole account.”

The Gitea Actions runner — for CI on every push — comes in module 15.

Step 4 — uv for Python environments

Inside a notebook container the default pip and venv work fine for a quick experiment, but the moment a student wants a reproducible environment, requirements.txt is not enough.

uv (by Astral) is the replacement. It is roughly 10–100× faster than pip and it generates a real lockfile (uv.lock) that pins every transitive. We use it both inside notebook containers and on the shell.

Install in a JupyterHub terminal:

curl -LsSf https://astral.sh/uv/install.sh | sh

Create a project the way the rest of the track will:

mkdir ~/work/hello-uv && cd ~/work/hello-uv
uv init --python 3.12
uv add pandas polars duckdb scikit-learn xgboost lightgbm matplotlib seaborn
uv run python -c "import polars as pl; print(pl.__version__)"

pyproject.toml now records the dependency intent, and uv.lock records the resolved versions. Both files must be committed. Forget the lockfile and reproducibility is gone the first time a transitive bumps.

Why uv over conda? Two reasons: it’s faster (a 60-package install is seconds), and the lockfile is real and cross-platform (conda lockfiles are environment-coupled in a way that breaks the moment you move machines). Conda still wins for non-Python binaries (CUDA-bundled distributions, R, BLAS), but for what this track does, uv is the better default.

Step 5 — Pre-commit

Pre-commit is a tiny git hook framework that runs linters and formatters before every commit. The point isn’t perfect code — it’s that nobody pushes whitespace noise or an import pdb left in by accident.

Inside the same hello-uv project:

uv add --dev pre-commit ruff mypy
uv run pre-commit install

Create .pre-commit-config.yaml:

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v5.0.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-added-large-files
        args: ["--maxkb=512"]            # block accidental dataset commits
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.6.9
    hooks:
      - id: ruff
        args: ["--fix"]
      - id: ruff-format
  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.13.0
    hooks:
      - id: mypy
        additional_dependencies: ["pandas-stubs"]

Note the check-added-large-files hook with a 512 KB ceiling — this stops the most expensive failure mode on a shared server, which is “a student git adds a 4 GB parquet, pushes, and now Gitea is full.” Datasets go through DVC and MinIO (module 03), not git.

Step 6 — First repo end to end

Have each student do this once, in a JupyterHub terminal:

cd ~/work
gh-host="$(hostname)"                                            # or your gitea host
uv init --python 3.12 first-repo
cd first-repo
git checkout -b main
git add .
git commit -m "Initial commit"
git remote add origin http://<student>@$gh-host:3000/<student>/first-repo.git
git push -u origin main

If the push fails on auth, generate a Gitea personal access token (User Settings → Applications) and use it as the password.

That repo now exists in Gitea, it has a lockfile, it has pre-commit on every commit, and pushing it will (in module 15) trigger CI. It’s the shape every project on the platform will take.

ADR 0002 — DockerSpawner over TLJH

Drop this into /srv/shared/adr/0002-jupyter-spawner.md:

# ADR 0002 — JupyterHub: DockerSpawner over TLJH / SystemdSpawner

## Status
Accepted, 2026-05-15.

## Context
The cohort needs per-user notebook environments on the shared GPU server.
Three options were considered: TLJH (SystemdSpawner), DockerSpawner with
allowed images, or KubeSpawner with a local k3s cluster.

## Decision
DockerSpawner with two allowed images (CPU and GPU). GPU pass-through is
enabled via a pre_spawn_hook that adds the nvidia device-request when the
GPU image is selected.

## Consequences
- Pro: per-user environment isolation; bad `pip install` cannot break the host.
- Pro: GPU bookings are a checkbox at notebook-start time.
- Con: pinning the spawner image means students cannot trivially install
  system packages. Workaround: the cookiecutter (module 03) generates a
  per-project Dockerfile that derives from the spawner image, used for
  any project that needs extra OS-level deps.

## Alternatives considered
- TLJH / SystemdSpawner. Simpler, but no GPU isolation — every notebook would
  see both L40S devices regardless of the user's intent.
- KubeSpawner on k3s. Right answer for ≥20 students; overkill for ≤6.

Recap and what’s next

You now have:

  • A multi-user notebook platform with two image profiles (CPU and GPU) and verified GPU pass-through.
  • A self-hosted git server with branch protection and scoped tokens.
  • A modern Python toolchain (uv + pre-commit) that every project will use.
  • A working first-repo recipe a student can run in five minutes.

What you don’t have yet: a place to put data, a way to track experiments, and a project template so every student isn’t reinventing the layout. That’s module 03 — the reproducibility stack.


Next: 03 — The reproducibility stack.