runpod-session.sh
A single bash script that spins up (or resumes) a RunPod GPU pod running Ollama, waits for it to be reachable, warms up your models into VRAM, and patches your opencode.jsonc to point at the live pod — all in one command.
Requirements
curlandjq- A RunPod account with:
- An API key
- A network volume (for persistent model storage)
- opencode installed and configured with a
runpodprovider block
Installation
chmod +x runpod-session.sh
# optionally symlink to somewhere on your PATH:
ln -s "$(pwd)/runpod-session.sh" ~/.local/bin/runpod-session
On first run the script creates ~/.config/runpod-session/config with defaults and exits — edit it, then re-run.
Configuration
Config file: ~/.config/runpod-session/config
# runpod-session configuration
RUNPOD_API_KEY="rpa_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Network volume name exactly as shown in RunPod dashboard
NETWORK_VOLUME_NAME="my-storage"
# Path where Ollama looks for models on the network volume
OLLAMA_MODELS_PATH="/workspace/models"
# Must match the key in your opencode.jsonc "provider" block
OPENCODE_PROVIDER="runpod"
# Model to activate by default (used when --model is not passed)
DEFAULT_MODEL="qwen3-coder:latest"
# All models stored on this pod — space-separated Ollama tags.
# These get registered in opencode.jsonc and are warmed up with --all-models.
WARMUP_MODELS="qwen3-coder:latest translategemma:27b"
# Preferred GPU display name (partial match, case-insensitive). Empty = show all within budget.
DEFAULT_GPU_TYPE="RTX PRO 6000"
# Hard $/hr ceiling — only GPUs at or below this price are shown
MAX_PRICE_PER_HR=2.50
# Pod configuration
CONTAINER_DISK_GB=15
GPU_COUNT=1
STARTUP_TIMEOUT=240 # seconds to wait for Ollama to become reachable
WARMUP_NUM_CTX=32768 # context size used when pre-loading models into VRAM
# External tool configs to patch with the live pod URL (leave empty to skip)
TRANSART_SCRIPT="" # e.g. /home/user/bin/transart.py
PUBLISHER_CONFIG="" # e.g. /home/user/.config/my-publisher/config.toml
Usage
runpod-session.sh [OPTIONS]
Options:
--model MODEL Ollama model tag to warm up (default: DEFAULT_MODEL in config)
--all-models Warm up ALL models listed in WARMUP_MODELS
--gpu-type 'NAME' Preferred GPU display name (partial match, case-insensitive)
--max-price PRICE Max $/hr ceiling (overrides config)
--new Force creation of a new pod (skip restart logic)
--stop Stop the current running pod
--status Show current session state and reachability
--help Show this help
Typical workflow
# Start a session (warm up default model)
runpod-session.sh
# Start with a specific model
runpod-session.sh --model qwen3-coder:latest
# Warm up all configured models
runpod-session.sh --all-models
# Check what's running and how much it's costing
runpod-session.sh --status
# Stop the pod when done (pod is stopped, not terminated — network volume persists)
runpod-session.sh --stop
# Force a fresh pod (terminates any existing stopped pod, creates new)
runpod-session.sh --new --gpu-type "RTX 4090" --max-price 1.80
How it works
- Existing pod check — queries RunPod for any pod with "ollama" in its name.
- If running and reachable: skips straight to warmup.
- If stopped/exited: prompts to restart, delete, or abort.
-
--newskips this check entirely. -
GPU selection — queries the RunPod GPU catalog for secure-cloud instances within your price ceiling, sorted by price. If
DEFAULT_GPU_TYPEis set, matching GPUs are shown first. Each candidate is shown with its VRAM and price; you confirm, skip (n), or abort (a) for each one. If a GPU you confirm turns out to have no available machines (SUPPLY_CONSTRAINT), the script moves to the next candidate automatically. -
Pod creation — deploys
ollama/ollama:lateston the chosen GPU with your network volume mounted at/workspace. Ollama is configured to listen on0.0.0.0:11434and keep models in VRAM for 1 hour after last use. -
Startup wait — polls
https://<pod-id>-11434.proxy.runpod.net/api/tagsevery 5 seconds until Ollama responds (up toSTARTUP_TIMEOUTseconds). -
Config patching — updates your opencode config and any external tools configured via
TRANSART_SCRIPT/PUBLISHER_CONFIG: provider.runpod.options.baseURL→ the live pod URLmodel→runpod/<DEFAULT_MODEL>provider.runpod.models→ merges allWARMUP_MODELSin (existing per-model settings are preserved via jq recursive merge)OLLAMA_HOSTintransart.py→ bare pod URL (no/v1)ollama_hostinmy-publisher/config.toml→ bare pod URL (no/v1)
A .bak copy is written before each file is modified. Entries left empty in config are skipped.
-
Model warmup — sends a short generation request to load the model into VRAM at
WARMUP_NUM_CTXcontext length, so the first real request isn't slow. -
State file — saves pod ID, URL, model, and timestamp to
~/.config/runpod-session/state.jsonfor use by--statusand--stop.
opencode provider setup
Your ~/.config/opencode/opencode.jsonc needs a runpod provider block before running the script. The script will fill in the baseURL on each session start:
{
"provider": {
"runpod": {
"options": {
"baseURL": "" // filled in automatically by runpod-session.sh
},
"models": {}
}
},
"model": "runpod/qwen3-coder:latest"
}
Files
| Path | Purpose |
|---|---|
~/.config/runpod-session/config |
Main config (sourced as bash) |
~/.config/runpod-session/state.json |
Last session record |
~/.config/opencode/opencode.jsonc |
Patched on each session start |
~/.config/opencode/opencode.jsonc.bak |
Backup written before each patch |
$TRANSART_SCRIPT |
OLLAMA_HOST updated if set in config |
$PUBLISHER_CONFIG |
ollama_host updated if set in config |
Cost notes
- Running pod: billed at the GPU hourly rate shown during selection.
- Stopped pod: network volume storage continues at ~$0.002/hr — terminate the pod if you no longer need it.
--stopstops (does not terminate) the pod so it can be quickly restarted without losing the volume.
