# runpod-session.sh A single bash script that spins up (or resumes) a RunPod GPU pod running Ollama, waits for it to be reachable, warms up your models into VRAM, and patches your `opencode.jsonc` to point at the live pod — all in one command. ## Requirements - `curl` and `jq` - A [RunPod](https://runpod.io) account with: - An API key - A network volume (for persistent model storage) - [opencode](https://opencode.ai) installed and configured with a `runpod` provider block ## Installation ```bash chmod +x runpod-session.sh # optionally symlink to somewhere on your PATH: ln -s "$(pwd)/runpod-session.sh" ~/.local/bin/runpod-session ``` On first run the script creates `~/.config/runpod-session/config` with defaults and exits — edit it, then re-run. ## Configuration Config file: `~/.config/runpod-session/config` ```bash # runpod-session configuration RUNPOD_API_KEY="rpa_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Network volume name exactly as shown in RunPod dashboard NETWORK_VOLUME_NAME="my-storage" # Path where Ollama looks for models on the network volume OLLAMA_MODELS_PATH="/workspace/models" # Must match the key in your opencode.jsonc "provider" block OPENCODE_PROVIDER="runpod" # Model to activate by default (used when --model is not passed) DEFAULT_MODEL="qwen3-coder:latest" # All models stored on this pod — space-separated Ollama tags. # These get registered in opencode.jsonc and are warmed up with --all-models. WARMUP_MODELS="qwen3-coder:latest translategemma:27b" # Preferred GPU display name (partial match, case-insensitive). Empty = show all within budget. DEFAULT_GPU_TYPE="RTX PRO 6000" # Hard $/hr ceiling — only GPUs at or below this price are shown MAX_PRICE_PER_HR=2.50 # Pod configuration CONTAINER_DISK_GB=15 GPU_COUNT=1 STARTUP_TIMEOUT=240 # seconds to wait for Ollama to become reachable WARMUP_NUM_CTX=32768 # context size used when pre-loading models into VRAM # External tool configs to patch with the live pod URL (leave empty to skip) TRANSART_SCRIPT="" # e.g. /home/user/bin/transart.py PUBLISHER_CONFIG="" # e.g. /home/user/.config/my-publisher/config.toml ``` ## Usage ``` runpod-session.sh [OPTIONS] Options: --model MODEL Ollama model tag to warm up (default: DEFAULT_MODEL in config) --all-models Warm up ALL models listed in WARMUP_MODELS --gpu-type 'NAME' Preferred GPU display name (partial match, case-insensitive) --max-price PRICE Max $/hr ceiling (overrides config) --new Force creation of a new pod (skip restart logic) --stop Stop the current running pod --status Show current session state and reachability --help Show this help ``` ### Typical workflow ```bash # Start a session (warm up default model) runpod-session.sh # Start with a specific model runpod-session.sh --model qwen3-coder:latest # Warm up all configured models runpod-session.sh --all-models # Check what's running and how much it's costing runpod-session.sh --status # Stop the pod when done (pod is stopped, not terminated — network volume persists) runpod-session.sh --stop # Force a fresh pod (terminates any existing stopped pod, creates new) runpod-session.sh --new --gpu-type "RTX 4090" --max-price 1.80 ``` ## How it works 1. **Existing pod check** — queries RunPod for any pod with "ollama" in its name. - If running and reachable: skips straight to warmup. - If stopped/exited: prompts to restart, delete, or abort. - `--new` skips this check entirely. 2. **GPU selection** — queries the RunPod GPU catalog for secure-cloud instances within your price ceiling, sorted by price. If `DEFAULT_GPU_TYPE` is set, matching GPUs are shown first. Each candidate is shown with its VRAM and price; you confirm, skip (`n`), or abort (`a`) for each one. If a GPU you confirm turns out to have no available machines (`SUPPLY_CONSTRAINT`), the script moves to the next candidate automatically. 3. **Pod creation** — deploys `ollama/ollama:latest` on the chosen GPU with your network volume mounted at `/workspace`. Ollama is configured to listen on `0.0.0.0:11434` and keep models in VRAM for 1 hour after last use. 4. **Startup wait** — polls `https://-11434.proxy.runpod.net/api/tags` every 5 seconds until Ollama responds (up to `STARTUP_TIMEOUT` seconds). 5. **Config patching** — updates your opencode config and any external tools configured via `TRANSART_SCRIPT` / `PUBLISHER_CONFIG`: - `provider.runpod.options.baseURL` → the live pod URL - `model` → `runpod/` - `provider.runpod.models` → merges all `WARMUP_MODELS` in (existing per-model settings are preserved via jq recursive merge) - `OLLAMA_HOST` in `transart.py` → bare pod URL (no `/v1`) - `ollama_host` in `my-publisher/config.toml` → bare pod URL (no `/v1`) A `.bak` copy is written before each file is modified. Entries left empty in config are skipped. 6. **Model warmup** — sends a short generation request to load the model into VRAM at `WARMUP_NUM_CTX` context length, so the first real request isn't slow. 7. **State file** — saves pod ID, URL, model, and timestamp to `~/.config/runpod-session/state.json` for use by `--status` and `--stop`. ## opencode provider setup Your `~/.config/opencode/opencode.jsonc` needs a `runpod` provider block before running the script. The script will fill in the `baseURL` on each session start: ```jsonc { "provider": { "runpod": { "options": { "baseURL": "" // filled in automatically by runpod-session.sh }, "models": {} } }, "model": "runpod/qwen3-coder:latest" } ``` ## Files | Path | Purpose | |------|---------| | `~/.config/runpod-session/config` | Main config (sourced as bash) | | `~/.config/runpod-session/state.json` | Last session record | | `~/.config/opencode/opencode.jsonc` | Patched on each session start | | `~/.config/opencode/opencode.jsonc.bak` | Backup written before each patch | | `$TRANSART_SCRIPT` | `OLLAMA_HOST` updated if set in config | | `$PUBLISHER_CONFIG` | `ollama_host` updated if set in config | ## Cost notes - **Running pod**: billed at the GPU hourly rate shown during selection. - **Stopped pod**: network volume storage continues at ~$0.002/hr — terminate the pod if you no longer need it. - `--stop` stops (does not terminate) the pod so it can be quickly restarted without losing the volume.