aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorDanilo M. <danix@danix.xyz>2026-05-11 20:23:52 +0200
committerDanilo M. <danix@danix.xyz>2026-05-11 20:23:52 +0200
commit5f0710065f3696d83163909192208b3324439fbd (patch)
tree5a46ab04a0b413434d357e0340ef7033eeea7f24
downloadollama-runpod-5f0710065f3696d83163909192208b3324439fbd.tar.gz
ollama-runpod-5f0710065f3696d83163909192208b3324439fbd.zip
Initial commit: runpod-session.sh with README and CLAUDE.md
Bash script to manage RunPod Ollama pod lifecycle for opencode: spin up / resume pod, wait for Ollama, patch opencode.jsonc baseURL, warm up models into VRAM. Includes per-GPU confirmation prompt and automatic fallback on SUPPLY_CONSTRAINT errors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
-rw-r--r--CLAUDE.md51
-rw-r--r--README.md154
-rwxr-xr-xrunpod-session.sh608
3 files changed, 813 insertions, 0 deletions
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 100644
index 0000000..4b1b242
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1,51 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## What this is
+
+Single bash script (`runpod-session.sh`) that manages RunPod GPU pod lifecycle for running Ollama models, then patches `~/.config/opencode/opencode.jsonc` so opencode points at the live pod.
+
+## Running / testing
+
+```bash
+# Check syntax
+bash -n runpod-session.sh
+
+# Lint
+shellcheck runpod-session.sh
+
+# Run (requires RUNPOD_API_KEY in ~/.config/runpod-session/config)
+./runpod-session.sh --status
+./runpod-session.sh --model qwen3-coder:latest
+./runpod-session.sh --stop
+./runpod-session.sh --all-models
+./runpod-session.sh --new --gpu-type "RTX PRO 6000" --max-price 1.50
+```
+
+## Architecture
+
+All logic is in `main()` which runs these steps in order:
+
+1. **Existing pod check** — queries RunPod GraphQL API, matches pod by name containing "ollama"
+2. **Pod creation** — if none found: picks cheapest secure-cloud GPU under `MAX_PRICE_PER_HR`, creates pod with `ollama/ollama:latest` image, network volume mounted at `/workspace`
+3. **Wait for Ollama** — polls `https://<pod-id>-11434.proxy.runpod.net/api/tags` until `.models` appears
+4. **Patch opencode.json** — updates `baseURL`, active `model`, and merges `WARMUP_MODELS` into provider models block using jq `*` merge (existing config wins on conflicts)
+5. **Warmup** — POST to `/api/generate` with a dummy prompt to load model into VRAM at `WARMUP_NUM_CTX` context length
+6. **Save state** — writes `~/.config/runpod-session/state.json` with pod_id, url, model, timestamp
+
+## Key design constraints
+
+- All RunPod calls go through `gql()` — single curl wrapper that exits on API errors
+- Pod is identified by name matching `test("ollama"; "i")` — not by ID — so the name `ollama-session` set at creation must not change
+- `patch_opencode_config()` writes a `.bak` before touching opencode.jsonc; jq `*` merge means existing per-model settings survive
+- `OLLAMA_MODELS_PATH` env var on the pod is not set by the script — must be set in config if models live outside default location on the network volume
+- GPU selection only queries `secureCloud == true` pods; community cloud is excluded
+
+## Config and state files
+
+| Path | Purpose |
+|------|---------|
+| `~/.config/runpod-session/config` | Sourced as bash; holds API key, defaults |
+| `~/.config/runpod-session/state.json` | Last session record (pod_id, url, model, timestamp) |
+| `~/.config/opencode/opencode.jsonc` | Patched in-place; `.bak` written before changes |
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..d12d1ec
--- /dev/null
+++ b/README.md
@@ -0,0 +1,154 @@
+# runpod-session.sh
+
+A single bash script that spins up (or resumes) a RunPod GPU pod running Ollama, waits for it to be reachable, warms up your models into VRAM, and patches your `opencode.jsonc` to point at the live pod — all in one command.
+
+## Requirements
+
+- `curl` and `jq`
+- A [RunPod](https://runpod.io) account with:
+ - An API key
+ - A network volume (for persistent model storage)
+- [opencode](https://opencode.ai) installed and configured with a `runpod` provider block
+
+## Installation
+
+```bash
+chmod +x runpod-session.sh
+# optionally symlink to somewhere on your PATH:
+ln -s "$(pwd)/runpod-session.sh" ~/.local/bin/runpod-session
+```
+
+On first run the script creates `~/.config/runpod-session/config` with defaults and exits — edit it, then re-run.
+
+## Configuration
+
+Config file: `~/.config/runpod-session/config`
+
+```bash
+# runpod-session configuration
+
+RUNPOD_API_KEY="rpa_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
+
+# Network volume name exactly as shown in RunPod dashboard
+NETWORK_VOLUME_NAME="my-storage"
+
+# Path where Ollama looks for models on the network volume
+OLLAMA_MODELS_PATH="/workspace/models"
+
+# Must match the key in your opencode.jsonc "provider" block
+OPENCODE_PROVIDER="runpod"
+
+# Model to activate by default (used when --model is not passed)
+DEFAULT_MODEL="qwen3-coder:latest"
+
+# All models stored on this pod — space-separated Ollama tags.
+# These get registered in opencode.jsonc and are warmed up with --all-models.
+WARMUP_MODELS="qwen3-coder:latest translategemma:27b"
+
+# Preferred GPU display name (partial match, case-insensitive). Empty = show all within budget.
+DEFAULT_GPU_TYPE="RTX PRO 6000"
+
+# Hard $/hr ceiling — only GPUs at or below this price are shown
+MAX_PRICE_PER_HR=2.50
+
+# Pod configuration
+CONTAINER_DISK_GB=15
+GPU_COUNT=1
+STARTUP_TIMEOUT=240 # seconds to wait for Ollama to become reachable
+WARMUP_NUM_CTX=32768 # context size used when pre-loading models into VRAM
+```
+
+## Usage
+
+```
+runpod-session.sh [OPTIONS]
+
+Options:
+ --model MODEL Ollama model tag to warm up (default: DEFAULT_MODEL in config)
+ --all-models Warm up ALL models listed in WARMUP_MODELS
+ --gpu-type 'NAME' Preferred GPU display name (partial match, case-insensitive)
+ --max-price PRICE Max $/hr ceiling (overrides config)
+ --new Force creation of a new pod (skip restart logic)
+ --stop Stop the current running pod
+ --status Show current session state and reachability
+ --help Show this help
+```
+
+### Typical workflow
+
+```bash
+# Start a session (warm up default model)
+runpod-session.sh
+
+# Start with a specific model
+runpod-session.sh --model qwen3-coder:latest
+
+# Warm up all configured models
+runpod-session.sh --all-models
+
+# Check what's running and how much it's costing
+runpod-session.sh --status
+
+# Stop the pod when done (pod is stopped, not terminated — network volume persists)
+runpod-session.sh --stop
+
+# Force a fresh pod (terminates any existing stopped pod, creates new)
+runpod-session.sh --new --gpu-type "RTX 4090" --max-price 1.80
+```
+
+## How it works
+
+1. **Existing pod check** — queries RunPod for any pod with "ollama" in its name.
+ - If running and reachable: skips straight to warmup.
+ - If stopped/exited: prompts to restart, delete, or abort.
+ - `--new` skips this check entirely.
+
+2. **GPU selection** — queries the RunPod GPU catalog for secure-cloud instances within your price ceiling, sorted by price. If `DEFAULT_GPU_TYPE` is set, matching GPUs are shown first. Each candidate is shown with its VRAM and price; you confirm, skip (`n`), or abort (`a`) for each one. If a GPU you confirm turns out to have no available machines (`SUPPLY_CONSTRAINT`), the script moves to the next candidate automatically.
+
+3. **Pod creation** — deploys `ollama/ollama:latest` on the chosen GPU with your network volume mounted at `/workspace`. Ollama is configured to listen on `0.0.0.0:11434` and keep models in VRAM for 1 hour after last use.
+
+4. **Startup wait** — polls `https://<pod-id>-11434.proxy.runpod.net/api/tags` every 5 seconds until Ollama responds (up to `STARTUP_TIMEOUT` seconds).
+
+5. **opencode.jsonc patch** — updates three fields in your opencode config:
+ - `provider.runpod.options.baseURL` → the live pod URL
+ - `model` → `runpod/<DEFAULT_MODEL>`
+ - `provider.runpod.models` → merges all `WARMUP_MODELS` in (existing per-model settings are preserved via jq recursive merge)
+
+ A `.bak` copy is written before any changes.
+
+6. **Model warmup** — sends a short generation request to load the model into VRAM at `WARMUP_NUM_CTX` context length, so the first real request isn't slow.
+
+7. **State file** — saves pod ID, URL, model, and timestamp to `~/.config/runpod-session/state.json` for use by `--status` and `--stop`.
+
+## opencode provider setup
+
+Your `~/.config/opencode/opencode.jsonc` needs a `runpod` provider block before running the script. The script will fill in the `baseURL` on each session start:
+
+```jsonc
+{
+ "provider": {
+ "runpod": {
+ "options": {
+ "baseURL": "" // filled in automatically by runpod-session.sh
+ },
+ "models": {}
+ }
+ },
+ "model": "runpod/qwen3-coder:latest"
+}
+```
+
+## Files
+
+| Path | Purpose |
+|------|---------|
+| `~/.config/runpod-session/config` | Main config (sourced as bash) |
+| `~/.config/runpod-session/state.json` | Last session record |
+| `~/.config/opencode/opencode.jsonc` | Patched on each session start |
+| `~/.config/opencode/opencode.jsonc.bak` | Backup written before each patch |
+
+## Cost notes
+
+- **Running pod**: billed at the GPU hourly rate shown during selection.
+- **Stopped pod**: network volume storage continues at ~$0.002/hr — terminate the pod if you no longer need it.
+- `--stop` stops (does not terminate) the pod so it can be quickly restarted without losing the volume.
diff --git a/runpod-session.sh b/runpod-session.sh
new file mode 100755
index 0000000..c81e8dd
--- /dev/null
+++ b/runpod-session.sh
@@ -0,0 +1,608 @@
+#!/usr/bin/env bash
+# runpod-session.sh — Manage RunPod Ollama sessions for opencode
+#
+# Usage:
+# runpod-session.sh [OPTIONS]
+#
+# Options:
+# --model MODEL Ollama model tag to warm up (e.g. qwen3-coder:latest)
+# Defaults to DEFAULT_MODEL in config
+# --all-models Warm up ALL models listed in WARMUP_MODELS config
+# --gpu-type 'NAME' Preferred GPU display name (partial match, case-insensitive)
+# --max-price PRICE Max $/hr ceiling (default: MAX_PRICE_PER_HR in config)
+# --new Force creation of a new pod (skip restart logic)
+# --stop Stop the current running pod
+# --status Show current session state and reachability
+# --help Show this help
+#
+# Requires: curl, jq
+# Config: ~/.config/runpod-session/config (auto-created on first run)
+
+set -euo pipefail
+
+# ─── Paths ────────────────────────────────────────────────────────────────────
+OPENCODE_CONFIG="$HOME/.config/opencode/opencode.jsonc"
+SESSION_CONFIG_DIR="$HOME/.config/runpod-session"
+SESSION_CONFIG="$SESSION_CONFIG_DIR/config"
+SESSION_STATE="$SESSION_CONFIG_DIR/state.json"
+RUNPOD_API="https://api.runpod.io/graphql"
+
+# ─── Defaults (overridden by config file) ─────────────────────────────────────
+OLLAMA_IMAGE="ollama/ollama:latest"
+NETWORK_VOLUME_NAME="danixland-storage"
+OPENCODE_PROVIDER="runpod"
+DEFAULT_MODEL="qwen3-coder:latest"
+WARMUP_MODELS="qwen3-coder:latest translategemma:27b"
+MAX_PRICE_PER_HR=2.50
+CONTAINER_DISK_GB=15
+DEFAULT_GPU_TYPE=""
+GPU_COUNT=1
+POLL_INTERVAL=5
+STARTUP_TIMEOUT=240
+WARMUP_NUM_CTX=32768
+
+# ─── Colors ───────────────────────────────────────────────────────────────────
+RED='\033[0;31m'; YELLOW='\033[1;33m'; GREEN='\033[0;32m'
+CYAN='\033[0;36m'; BOLD='\033[1m'; RESET='\033[0m'
+
+log() { echo -e "${CYAN}[runpod]${RESET} $*" >&2; }
+ok() { echo -e "${GREEN}[ok]${RESET} $*" >&2; }
+warn() { echo -e "${YELLOW}[warn]${RESET} $*" >&2; }
+die() { echo -e "${RED}[error]${RESET} $*" >&2; exit 1; }
+
+# ─── Dependency check ─────────────────────────────────────────────────────────
+for cmd in curl jq; do
+ command -v "$cmd" &>/dev/null || die "Required command not found: $cmd"
+done
+
+# ─── Config bootstrap ─────────────────────────────────────────────────────────
+mkdir -p "$SESSION_CONFIG_DIR"
+
+if [[ ! -f "$SESSION_CONFIG" ]]; then
+ warn "No config found — creating $SESSION_CONFIG"
+ cat > "$SESSION_CONFIG" <<'CONF'
+# runpod-session configuration — edit then re-run.
+
+RUNPOD_API_KEY=""
+
+# Network volume name as shown in RunPod dashboard
+NETWORK_VOLUME_NAME="danixland-storage"
+
+# Must match the key in your opencode.json "provider" block
+OPENCODE_PROVIDER="runpod"
+
+# Model to activate by default (used when --model is not passed)
+DEFAULT_MODEL="qwen3-coder:latest"
+
+# All models that live on this pod — space-separated Ollama tags.
+# These get registered in opencode.json and are warmed up with --all-models.
+WARMUP_MODELS="qwen3-coder:latest translategemma:27b"
+
+# GPU selection
+DEFAULT_GPU_TYPE="" # e.g. "RTX PRO 6000" — empty = cheapest available
+MAX_PRICE_PER_HR=2.50 # hard $/hr ceiling
+
+# Pod configuration
+CONTAINER_DISK_GB=15
+GPU_COUNT=1
+STARTUP_TIMEOUT=240 # seconds before giving up waiting for Ollama
+WARMUP_NUM_CTX=32768 # num_ctx used when warming up models into VRAM
+CONF
+ echo ""
+ echo -e "${YELLOW}Edit ${BOLD}$SESSION_CONFIG${RESET}${YELLOW}, set RUNPOD_API_KEY, then re-run.${RESET}"
+ exit 0
+fi
+
+# shellcheck source=/dev/null
+source "$SESSION_CONFIG"
+[[ -z "${RUNPOD_API_KEY:-}" ]] && die "RUNPOD_API_KEY not set in $SESSION_CONFIG"
+
+# ─── Argument parsing ─────────────────────────────────────────────────────────
+OPT_MODEL="${DEFAULT_MODEL}"
+OPT_ALL_MODELS=0
+OPT_GPU_TYPE="${DEFAULT_GPU_TYPE:-}"
+OPT_MAX_PRICE="${MAX_PRICE_PER_HR}"
+OPT_FORCE_NEW=0
+OPT_STOP=0
+OPT_STATUS=0
+
+while [[ $# -gt 0 ]]; do
+ case "$1" in
+ --model) OPT_MODEL="$2"; shift 2 ;;
+ --all-models) OPT_ALL_MODELS=1; shift ;;
+ --gpu-type) OPT_GPU_TYPE="$2"; shift 2 ;;
+ --max-price) OPT_MAX_PRICE="$2"; shift 2 ;;
+ --new) OPT_FORCE_NEW=1; shift ;;
+ --stop) OPT_STOP=1; shift ;;
+ --status) OPT_STATUS=1; shift ;;
+ --help|-h) sed -n '2,17p' "$0"; exit 0 ;;
+ *) die "Unknown option: $1 (use --help)" ;;
+ esac
+done
+
+# ─── RunPod GraphQL helper ────────────────────────────────────────────────────
+gql() {
+ local err
+ local response
+ response=$(curl --ipv4 -s \
+ -H "Content-Type: application/json" \
+ -H "Authorization: Bearer $RUNPOD_API_KEY" \
+ -d "$1" \
+ "$RUNPOD_API")
+ if [[ $? -ne 0 || -z "$response" ]]; then
+ die "RunPod API request failed — check API key and connectivity"
+ fi
+ err=$(echo "$response" | jq -r '.errors[0].message // empty' 2>/dev/null || true)
+ [[ -n "$err" ]] && die "RunPod API error: $err"
+ echo "$response"
+}
+
+# ─── Pod queries ──────────────────────────────────────────────────────────────
+get_pods() {
+ gql '{"query":"{ myself { pods { id name desiredStatus costPerHr runtime { uptimeInSeconds } machine { gpuDisplayName } } } }"}'
+}
+
+find_ollama_pod() {
+ # Returns compact JSON of the first pod matching "ollama" in its name, or empty string
+ local result
+ result=$(echo "$1" | jq -c \
+ '.data.myself.pods[] | select(.name | test("ollama"; "i"))' 2>/dev/null | head -1)
+ echo "$result"
+}
+
+# ─── --status subcommand ──────────────────────────────────────────────────────
+cmd_status() {
+ echo ""
+ echo -e "${BOLD}runpod-session status${RESET}"
+ echo ""
+
+ # ── Live pod data from API ────────────────────────────────────────────────
+ local pods_json pod_json
+ pods_json=$(get_pods)
+ pod_json=$(find_ollama_pod "$pods_json")
+
+ if [[ -n "$pod_json" ]]; then
+ local pod_id status gpu cost_hr uptime_sec cost_so_far
+ pod_id=$(echo "$pod_json" | jq -r '.id')
+ status=$(echo "$pod_json" | jq -r '.desiredStatus')
+ gpu=$(echo "$pod_json" | jq -r '.machine.gpuDisplayName // "?"')
+ cost_hr=$(echo "$pod_json" | jq -r '.costPerHr // 0')
+ uptime_sec=$(echo "$pod_json" | jq -r '.runtime.uptimeInSeconds // 0')
+
+ # Calculate cost: (uptime_seconds / 3600) * cost_per_hr
+ cost_so_far=$(echo "$uptime_sec $cost_hr" | awk '{printf "%.4f", ($1/3600)*$2}')
+ uptime_human=$(echo "$uptime_sec" | awk '{
+ h=int($1/3600); m=int(($1%3600)/60); s=$1%60
+ if (h>0) printf "%dh %dm %ds", h, m, s
+ else if (m>0) printf "%dm %ds", m, s
+ else printf "%ds", s
+ }')
+
+ echo -e " Pod: ${BOLD}${pod_id}${RESET}"
+ echo -e " GPU: ${gpu}"
+ echo -e " Status: ${status}"
+ echo -e " Rate: \$${cost_hr}/hr"
+ if [[ "$status" == "RUNNING" ]]; then
+ echo -e " Uptime: ${uptime_human}"
+ echo -e " Cost: \$${cost_so_far} this session"
+ else
+ echo -e " Uptime: —"
+ echo -e " Cost: —"
+ fi
+ echo ""
+
+ # ── Ollama reachability ───────────────────────────────────────────────
+ local base="https://${pod_id}-11434.proxy.runpod.net"
+ local tags_response
+ tags_response=$(curl -s --ipv4 --max-time 5 "${base}/api/tags" 2>/dev/null || true)
+ if echo "$tags_response" | jq -e '.models' > /dev/null 2>&1; then
+ echo -e " Ollama: ${GREEN}reachable${RESET}"
+ local models
+ models=$(echo "$tags_response" | jq -r '.models[].name' 2>/dev/null || true)
+ if [[ -n "$models" ]]; then
+ echo -e " In VRAM: $(echo "$models" | tr '\n' ' ')"
+ else
+ echo -e " In VRAM: none (model will load on first request)"
+ fi
+ else
+ echo -e " Ollama: ${YELLOW}not reachable${RESET}"
+ fi
+ else
+ echo -e " ${YELLOW}No Ollama pod found in your account.${RESET}"
+ fi
+
+ # ── Saved session state ───────────────────────────────────────────────────
+ if [[ -f "$SESSION_STATE" ]]; then
+ echo ""
+ echo -e " ${BOLD}Last session record:${RESET}"
+ jq -r '" Started: \(.started_at)\n Model: \(.model)\n URL: \(.ollama_url)"' \
+ "$SESSION_STATE"
+ fi
+
+ echo ""
+}
+
+# ─── Pod lifecycle ────────────────────────────────────────────────────────────
+restart_pod() {
+ log "Restarting pod $1 ..."
+ gql "{\"query\":\"mutation { podResume(input: { podId: \\\"$1\\\", gpuCount: $GPU_COUNT }) { id desiredStatus } }\"}" > /dev/null
+}
+
+stop_pod() {
+ log "Stopping pod $1 ..."
+ gql "{\"query\":\"mutation { podStop(input: { podId: \\\"$1\\\" }) { id desiredStatus } }\"}" > /dev/null
+ ok "Pod stopped. Storage costs continue while stopped (~\$0.002/hr)."
+}
+
+terminate_pod() {
+ log "Terminating pod $1 ..."
+ gql "{\"query\":\"mutation { podTerminate(input: { podId: \\\"$1\\\" }) }\"}" > /dev/null
+ ok "Pod terminated."
+}
+
+# ─── --stop subcommand ────────────────────────────────────────────────────────
+cmd_stop() {
+ local pod_id=""
+ if [[ -f "$SESSION_STATE" ]]; then
+ pod_id=$(jq -r '.pod_id' "$SESSION_STATE")
+ else
+ local pods_json pod_json
+ pods_json=$(get_pods)
+ pod_json=$(find_ollama_pod "$pods_json")
+ [[ -z "$pod_json" ]] && die "No Ollama pod found in your account."
+ pod_id=$(echo "$pod_json" | jq -r '.id')
+ fi
+ echo -n " Stop pod $pod_id? [y/N] "
+ read -r confirm
+ [[ "${confirm,,}" != "y" ]] && { log "Aborted."; exit 0; }
+ stop_pod "$pod_id"
+ rm -f "$SESSION_STATE"
+}
+
+# ─── GPU selection ────────────────────────────────────────────────────────────
+# Returns a JSON array of candidates sorted by price (preferred first if set).
+# Caller iterates and tries each until one succeeds.
+get_gpu_candidates() {
+ local preferred="$1" max_price="$2"
+ log "Querying available GPUs (secure cloud, max \$$max_price/hr) ..." >&2
+
+ local result candidates
+ result=$(curl -s --ipv4 \
+ -H "Content-Type: application/json" \
+ -H "Authorization: Bearer $RUNPOD_API_KEY" \
+ -d '{"query":"{ gpuTypes { id displayName memoryInGb secureCloud lowestPrice(input: { gpuCount: 1 }) { uninterruptablePrice } } }"}' \
+ "$RUNPOD_API")
+
+ candidates=$(echo "$result" | jq -c \
+ --arg max "$max_price" \
+ '[.data.gpuTypes[]
+ | select(.secureCloud == true)
+ | select((.lowestPrice.uninterruptablePrice // 0) > 0)
+ | select(.lowestPrice.uninterruptablePrice <= ($max | tonumber))
+ | { id: .id, name: .displayName, vram: .memoryInGb,
+ price: .lowestPrice.uninterruptablePrice }
+ ] | sort_by(.price)')
+
+ [[ -z "$candidates" || "$candidates" == "[]" ]] && \
+ die "No GPUs on secure cloud within \$$max_price/hr. Try --max-price."
+
+ # Bubble preferred GPU to front so it's tried first
+ if [[ -n "$preferred" ]]; then
+ local reordered
+ reordered=$(echo "$candidates" | jq -c \
+ --arg p "$preferred" \
+ '( [.[] | select(.name | test($p; "i"))] ) +
+ ( [.[] | select(.name | test($p; "i") | not)] )')
+ local pcount
+ pcount=$(echo "$reordered" | jq 'map(select(.name | test($p; "i"))) | length' --arg p "$preferred" 2>/dev/null || echo 0)
+ if [[ "$pcount" -eq 0 ]]; then
+ warn "Preferred GPU '$preferred' not in catalog within price limit. Will try all." >&2
+ fi
+ candidates="$reordered"
+ fi
+
+ echo "$candidates"
+}
+
+# ─── Network volume ───────────────────────────────────────────────────────────
+get_network_volume_id() {
+ local result vol_id
+ result=$(curl -s --ipv4 \
+ -H "Content-Type: application/json" \
+ -H "Authorization: Bearer $RUNPOD_API_KEY" \
+ -d '{"query":"{ myself { networkVolumes { id name } } }"}' \
+ "$RUNPOD_API")
+ vol_id=$(echo "$result" | jq -r \
+ --arg n "$NETWORK_VOLUME_NAME" \
+ '.data.myself.networkVolumes[] | select(.name == $n) | .id')
+ [[ -z "$vol_id" ]] && die "Network volume '$NETWORK_VOLUME_NAME' not found."
+ echo "$vol_id"
+}
+
+# ─── Create pod ───────────────────────────────────────────────────────────────
+create_pod() {
+ local gpu_id="$1" vol_id="$2"
+ log "Creating pod ..."
+
+ # Build env array — omit OLLAMA_MODELS entry if path is unset
+ local env_json
+ env_json='[{"key":"OLLAMA_HOST","value":"0.0.0.0"},{"key":"OLLAMA_LOAD_TIMEOUT","value":"10m"},{"key":"OLLAMA_KEEP_ALIVE","value":"1h"}]'
+ if [[ -n "${OLLAMA_MODELS_PATH:-}" ]]; then
+ env_json=$(echo "$env_json" | jq \
+ --arg v "$OLLAMA_MODELS_PATH" \
+ '. + [{"key":"OLLAMA_MODELS","value":$v}]')
+ fi
+
+ local payload
+ payload=$(jq -n \
+ --arg gpu_id "$gpu_id" \
+ --arg vol_id "$vol_id" \
+ --arg image "$OLLAMA_IMAGE" \
+ --argjson gpu_count "$GPU_COUNT" \
+ --argjson disk "$CONTAINER_DISK_GB" \
+ --argjson env "$env_json" \
+ '{query: "mutation($input: PodFindAndDeployOnDemandInput!) { podFindAndDeployOnDemand(input: $input) { id } }",
+ variables: { input: {
+ cloudType: "SECURE",
+ gpuCount: $gpu_count,
+ volumeInGb: 0,
+ containerDiskInGb: $disk,
+ minVcpuCount: 4,
+ minMemoryInGb: 15,
+ gpuTypeId: $gpu_id,
+ name: "ollama-session",
+ imageName: $image,
+ ports: "11434/http",
+ volumeMountPath: "/workspace",
+ networkVolumeId: $vol_id,
+ env: $env
+ }}}')
+
+ local result pod_id err_code
+ result=$(curl -s --ipv4 \
+ -H "Content-Type: application/json" \
+ -H "Authorization: Bearer $RUNPOD_API_KEY" \
+ -d "$payload" \
+ "$RUNPOD_API")
+
+ pod_id=$(echo "$result" | jq -r '.data.podFindAndDeployOnDemand.id // empty')
+ if [[ -n "$pod_id" ]]; then
+ echo "$pod_id"
+ return 0
+ fi
+
+ err_code=$(echo "$result" | jq -r '.errors[0].extensions.code // empty')
+ if [[ "$err_code" == "SUPPLY_CONSTRAINT" ]]; then
+ warn "No supply for GPU $gpu_id — trying next candidate ..." >&2
+ return 1
+ fi
+ die "Pod creation failed. Response: $result"
+}
+
+# ─── Wait for Ollama ──────────────────────────────────────────────────────────
+wait_for_pod() {
+ local pod_id="$1"
+ local url="https://${pod_id}-11434.proxy.runpod.net"
+ local elapsed=0
+ log "Polling $url/api/tags (timeout: ${STARTUP_TIMEOUT}s) ..."
+ while (( elapsed < STARTUP_TIMEOUT )); do
+ local response
+ response=$(curl -s --ipv4 --max-time 5 "${url}/api/tags" 2>/dev/null || true)
+ if echo "$response" | jq -e '.models' > /dev/null 2>&1; then
+ echo ""; ok "Ollama is up."; return 0
+ fi
+ printf " [%3ds] waiting...\r" "$elapsed"
+ sleep "$POLL_INTERVAL"
+ (( elapsed += POLL_INTERVAL ))
+ done
+ echo ""
+ die "Timed out after ${STARTUP_TIMEOUT}s. Check the RunPod dashboard."
+}
+
+# ─── Patch opencode.json ──────────────────────────────────────────────────────
+#
+# What this changes in your opencode.json:
+# .provider.runpod.options.baseURL → https://<pod-id>-11434.proxy.runpod.net/v1
+# .model → runpod/qwen3-coder:latest (DEFAULT_MODEL)
+# .provider.runpod.models → merges all WARMUP_MODELS in (preserving
+# any existing per-model config you have)
+#
+# Uses jq's `*` (recursive merge) so your existing model overrides are never clobbered.
+# A .bak backup is written before any changes.
+#
+patch_opencode_config() {
+ local new_url="$1" # full URL including /v1
+
+ [[ ! -f "$OPENCODE_CONFIG" ]] && die "opencode config not found at $OPENCODE_CONFIG"
+ cp "$OPENCODE_CONFIG" "${OPENCODE_CONFIG}.bak"
+
+ # Build a jq object for all warmup models: { "model:tag": {"tools":true}, ... }
+ # tools:true is a safe default — it won't override existing per-model settings
+ # because we merge with * where existing config wins on conflicts.
+ local models_patch="{}"
+ for m in $WARMUP_MODELS; do
+ models_patch=$(printf '%s' "$models_patch" \
+ | jq --arg m "$m" '. + {($m): {"tools": true}}')
+ done
+
+ local tmp
+ tmp=$(mktemp)
+ jq \
+ --arg provider "$OPENCODE_PROVIDER" \
+ --arg url "$new_url" \
+ --arg model "${OPENCODE_PROVIDER}/${DEFAULT_MODEL}" \
+ --argjson patch "$models_patch" \
+ '
+ .provider[$provider].options.baseURL = $url
+ | .model = $model
+ | .provider[$provider].models = (
+ $patch * (.provider[$provider].models // {})
+ )
+ ' "$OPENCODE_CONFIG" > "$tmp" && mv "$tmp" "$OPENCODE_CONFIG"
+
+ ok "opencode.json patched:"
+ log " provider.${OPENCODE_PROVIDER}.options.baseURL = $new_url"
+ log " model = ${OPENCODE_PROVIDER}/${DEFAULT_MODEL}"
+}
+
+# ─── Warm up one model ────────────────────────────────────────────────────────
+warmup_model() {
+ local pod_id="$1" model="$2"
+ local base="https://${pod_id}-11434.proxy.runpod.net"
+
+ log "Warming up '$model' into VRAM ..."
+
+ if curl -s --ipv4 --max-time 300 -X POST "${base}/api/generate" \
+ -H "Content-Type: application/json" \
+ -d "{\"model\": \"$model\", \"prompt\": \"hi\", \"stream\": false, \"options\": {\"num_ctx\": ${WARMUP_NUM_CTX:-32768}}}" \
+ > /dev/null 2>&1; then
+ ok " '$model' is loaded."
+ else
+ warn " Warmup for '$model' failed — model will load on first use."
+ fi
+}
+
+# ─── State persistence ────────────────────────────────────────────────────────
+save_state() {
+ jq -n \
+ --arg pod_id "$1" \
+ --arg url "$2" \
+ --arg model "$3" \
+ --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
+ '{ pod_id: $pod_id, ollama_url: $url, model: $model, started_at: $ts }' \
+ > "$SESSION_STATE"
+}
+
+# ─── Main ─────────────────────────────────────────────────────────────────────
+main() {
+ [[ $OPT_STATUS -eq 1 ]] && { cmd_status; exit 0; }
+ [[ $OPT_STOP -eq 1 ]] && { cmd_stop; exit 0; }
+
+ echo -e "${BOLD}runpod-session${RESET} — Ollama on RunPod → opencode"
+ echo ""
+
+ local pod_id=""
+ local _skip_wait=0
+
+ # ── 1. Existing pod check ─────────────────────────────────────────────────
+ if [[ $OPT_FORCE_NEW -eq 0 ]]; then
+ log "Checking for existing Ollama pods ..."
+ local pods_json pod_json
+ pods_json=$(get_pods)
+ pod_json=$(find_ollama_pod "$pods_json")
+
+ if [[ -n "$pod_json" ]]; then
+ pod_id=$(echo "$pod_json" | jq -r '.id')
+ local status gpu cost
+ status=$(echo "$pod_json" | jq -r '.desiredStatus')
+ gpu=$(echo "$pod_json" | jq -r '.machine.gpuDisplayName // "?"')
+ cost=$(echo "$pod_json" | jq -r '.costPerHr // "?"')
+
+ echo -e " Found: ${BOLD}${pod_id}${RESET} GPU: ${gpu} \$${cost}/hr Status: ${status}"
+ echo ""
+
+ case "$status" in
+ RUNNING)
+ local _check
+ _check=$(curl -s --ipv4 --max-time 5 "https://${pod_id}-11434.proxy.runpod.net/api/tags" 2>/dev/null || true)
+ if echo "$_check" | jq -e '.models' > /dev/null 2>&1; then
+ ok "Already running and reachable — skipping startup sequence."
+ _skip_wait=1
+ else
+ log "Pod is running but Ollama not yet reachable — waiting ..."
+ fi
+ ;;
+ EXITED|STOPPED)
+ echo -n " [R]estart [D]elete and create new [A]bort [R/d/a]: "
+ read -r choice
+ case "${choice,,}" in
+ d) terminate_pod "$pod_id"; pod_id="" ;;
+ a) log "Aborted."; exit 0 ;;
+ *) restart_pod "$pod_id" ;;
+ esac
+ ;;
+ *)
+ warn "Unexpected pod state '$status' — ignoring this pod."
+ pod_id=""
+ ;;
+ esac
+ else
+ log "No existing Ollama pod found."
+ fi
+ fi
+
+ # ── 2. Create new pod if needed ───────────────────────────────────────────
+ if [[ -z "$pod_id" ]]; then
+ local vol_id candidates gpu_json gpu_id gpu_name gpu_vram gpu_price
+ vol_id=$(get_network_volume_id)
+ ok "Volume: ${NETWORK_VOLUME_NAME} ($vol_id)"
+
+ candidates=$(get_gpu_candidates "$OPT_GPU_TYPE" "$OPT_MAX_PRICE")
+ local count
+ count=$(echo "$candidates" | jq 'length')
+ [[ "$count" -eq 0 ]] && die "No GPU candidates found."
+
+ log "${count} GPU options within budget. Will prompt for each."
+
+ local i=0
+ while [[ $i -lt $count ]]; do
+ gpu_json=$(echo "$candidates" | jq -c --argjson i "$i" '.[$i]')
+ gpu_id=$(echo "$gpu_json" | jq -r '.id')
+ gpu_name=$(echo "$gpu_json" | jq -r '.name')
+ gpu_vram=$(echo "$gpu_json" | jq -r '.vram')
+ gpu_price=$(echo "$gpu_json" | jq -r '.price')
+
+ echo ""
+ echo -e " ${BOLD}${gpu_name}${RESET} ${gpu_vram}GB VRAM \$${gpu_price}/hr"
+ echo -n " Create pod with this GPU? [Y/n/a(bort)] "
+ read -r choice
+ case "${choice,,}" in
+ a) log "Aborted."; exit 0 ;;
+ n) (( i++ )) || true; continue ;;
+ esac
+
+ if pod_id=$(create_pod "$gpu_id" "$vol_id"); then
+ ok "Pod created: $pod_id GPU: ${gpu_name}"
+ break
+ fi
+ (( i++ )) || true
+ done
+
+ [[ -z "$pod_id" ]] && die "All ${count} GPU candidates exhausted. Try --max-price or later."
+ fi
+
+ # ── 3. Wait for Ollama if needed ─────────────────────────────────────────
+ if [[ "${_skip_wait:-0}" != "1" ]]; then
+ wait_for_pod "$pod_id"
+ fi
+
+ # ── 4. Final URL ──────────────────────────────────────────────────────────
+ local ollama_url="https://${pod_id}-11434.proxy.runpod.net/v1"
+
+ # ── 5. Patch opencode.json ────────────────────────────────────────────────
+ patch_opencode_config "$ollama_url"
+
+ # ── 6. Warmup ─────────────────────────────────────────────────────────────
+ if [[ $OPT_ALL_MODELS -eq 1 ]]; then
+ for m in $WARMUP_MODELS; do
+ warmup_model "$pod_id" "$m"
+ done
+ elif [[ -n "$OPT_MODEL" ]]; then
+ warmup_model "$pod_id" "$OPT_MODEL"
+ else
+ warn "No warmup requested. Use --model MODEL or --all-models."
+ fi
+
+ # ── 7. Save state ─────────────────────────────────────────────────────────
+ save_state "$pod_id" "$ollama_url" "$OPT_MODEL"
+
+ # ── 8. Done ───────────────────────────────────────────────────────────────
+ echo ""
+ echo -e "${BOLD}${GREEN}Ready.${RESET}"
+ printf " Pod: %s\n" "$pod_id"
+ printf " URL: %s\n" "$ollama_url"
+ printf " Run: opencode\n"
+ echo ""
+}
+
+main "$@"