diff options
| -rw-r--r-- | CLAUDE.md | 51 | ||||
| -rw-r--r-- | README.md | 154 | ||||
| -rwxr-xr-x | runpod-session.sh | 608 |
3 files changed, 813 insertions, 0 deletions
diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..4b1b242 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,51 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## What this is + +Single bash script (`runpod-session.sh`) that manages RunPod GPU pod lifecycle for running Ollama models, then patches `~/.config/opencode/opencode.jsonc` so opencode points at the live pod. + +## Running / testing + +```bash +# Check syntax +bash -n runpod-session.sh + +# Lint +shellcheck runpod-session.sh + +# Run (requires RUNPOD_API_KEY in ~/.config/runpod-session/config) +./runpod-session.sh --status +./runpod-session.sh --model qwen3-coder:latest +./runpod-session.sh --stop +./runpod-session.sh --all-models +./runpod-session.sh --new --gpu-type "RTX PRO 6000" --max-price 1.50 +``` + +## Architecture + +All logic is in `main()` which runs these steps in order: + +1. **Existing pod check** — queries RunPod GraphQL API, matches pod by name containing "ollama" +2. **Pod creation** — if none found: picks cheapest secure-cloud GPU under `MAX_PRICE_PER_HR`, creates pod with `ollama/ollama:latest` image, network volume mounted at `/workspace` +3. **Wait for Ollama** — polls `https://<pod-id>-11434.proxy.runpod.net/api/tags` until `.models` appears +4. **Patch opencode.json** — updates `baseURL`, active `model`, and merges `WARMUP_MODELS` into provider models block using jq `*` merge (existing config wins on conflicts) +5. **Warmup** — POST to `/api/generate` with a dummy prompt to load model into VRAM at `WARMUP_NUM_CTX` context length +6. **Save state** — writes `~/.config/runpod-session/state.json` with pod_id, url, model, timestamp + +## Key design constraints + +- All RunPod calls go through `gql()` — single curl wrapper that exits on API errors +- Pod is identified by name matching `test("ollama"; "i")` — not by ID — so the name `ollama-session` set at creation must not change +- `patch_opencode_config()` writes a `.bak` before touching opencode.jsonc; jq `*` merge means existing per-model settings survive +- `OLLAMA_MODELS_PATH` env var on the pod is not set by the script — must be set in config if models live outside default location on the network volume +- GPU selection only queries `secureCloud == true` pods; community cloud is excluded + +## Config and state files + +| Path | Purpose | +|------|---------| +| `~/.config/runpod-session/config` | Sourced as bash; holds API key, defaults | +| `~/.config/runpod-session/state.json` | Last session record (pod_id, url, model, timestamp) | +| `~/.config/opencode/opencode.jsonc` | Patched in-place; `.bak` written before changes | diff --git a/README.md b/README.md new file mode 100644 index 0000000..d12d1ec --- /dev/null +++ b/README.md @@ -0,0 +1,154 @@ +# runpod-session.sh + +A single bash script that spins up (or resumes) a RunPod GPU pod running Ollama, waits for it to be reachable, warms up your models into VRAM, and patches your `opencode.jsonc` to point at the live pod — all in one command. + +## Requirements + +- `curl` and `jq` +- A [RunPod](https://runpod.io) account with: + - An API key + - A network volume (for persistent model storage) +- [opencode](https://opencode.ai) installed and configured with a `runpod` provider block + +## Installation + +```bash +chmod +x runpod-session.sh +# optionally symlink to somewhere on your PATH: +ln -s "$(pwd)/runpod-session.sh" ~/.local/bin/runpod-session +``` + +On first run the script creates `~/.config/runpod-session/config` with defaults and exits — edit it, then re-run. + +## Configuration + +Config file: `~/.config/runpod-session/config` + +```bash +# runpod-session configuration + +RUNPOD_API_KEY="rpa_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" + +# Network volume name exactly as shown in RunPod dashboard +NETWORK_VOLUME_NAME="my-storage" + +# Path where Ollama looks for models on the network volume +OLLAMA_MODELS_PATH="/workspace/models" + +# Must match the key in your opencode.jsonc "provider" block +OPENCODE_PROVIDER="runpod" + +# Model to activate by default (used when --model is not passed) +DEFAULT_MODEL="qwen3-coder:latest" + +# All models stored on this pod — space-separated Ollama tags. +# These get registered in opencode.jsonc and are warmed up with --all-models. +WARMUP_MODELS="qwen3-coder:latest translategemma:27b" + +# Preferred GPU display name (partial match, case-insensitive). Empty = show all within budget. +DEFAULT_GPU_TYPE="RTX PRO 6000" + +# Hard $/hr ceiling — only GPUs at or below this price are shown +MAX_PRICE_PER_HR=2.50 + +# Pod configuration +CONTAINER_DISK_GB=15 +GPU_COUNT=1 +STARTUP_TIMEOUT=240 # seconds to wait for Ollama to become reachable +WARMUP_NUM_CTX=32768 # context size used when pre-loading models into VRAM +``` + +## Usage + +``` +runpod-session.sh [OPTIONS] + +Options: + --model MODEL Ollama model tag to warm up (default: DEFAULT_MODEL in config) + --all-models Warm up ALL models listed in WARMUP_MODELS + --gpu-type 'NAME' Preferred GPU display name (partial match, case-insensitive) + --max-price PRICE Max $/hr ceiling (overrides config) + --new Force creation of a new pod (skip restart logic) + --stop Stop the current running pod + --status Show current session state and reachability + --help Show this help +``` + +### Typical workflow + +```bash +# Start a session (warm up default model) +runpod-session.sh + +# Start with a specific model +runpod-session.sh --model qwen3-coder:latest + +# Warm up all configured models +runpod-session.sh --all-models + +# Check what's running and how much it's costing +runpod-session.sh --status + +# Stop the pod when done (pod is stopped, not terminated — network volume persists) +runpod-session.sh --stop + +# Force a fresh pod (terminates any existing stopped pod, creates new) +runpod-session.sh --new --gpu-type "RTX 4090" --max-price 1.80 +``` + +## How it works + +1. **Existing pod check** — queries RunPod for any pod with "ollama" in its name. + - If running and reachable: skips straight to warmup. + - If stopped/exited: prompts to restart, delete, or abort. + - `--new` skips this check entirely. + +2. **GPU selection** — queries the RunPod GPU catalog for secure-cloud instances within your price ceiling, sorted by price. If `DEFAULT_GPU_TYPE` is set, matching GPUs are shown first. Each candidate is shown with its VRAM and price; you confirm, skip (`n`), or abort (`a`) for each one. If a GPU you confirm turns out to have no available machines (`SUPPLY_CONSTRAINT`), the script moves to the next candidate automatically. + +3. **Pod creation** — deploys `ollama/ollama:latest` on the chosen GPU with your network volume mounted at `/workspace`. Ollama is configured to listen on `0.0.0.0:11434` and keep models in VRAM for 1 hour after last use. + +4. **Startup wait** — polls `https://<pod-id>-11434.proxy.runpod.net/api/tags` every 5 seconds until Ollama responds (up to `STARTUP_TIMEOUT` seconds). + +5. **opencode.jsonc patch** — updates three fields in your opencode config: + - `provider.runpod.options.baseURL` → the live pod URL + - `model` → `runpod/<DEFAULT_MODEL>` + - `provider.runpod.models` → merges all `WARMUP_MODELS` in (existing per-model settings are preserved via jq recursive merge) + + A `.bak` copy is written before any changes. + +6. **Model warmup** — sends a short generation request to load the model into VRAM at `WARMUP_NUM_CTX` context length, so the first real request isn't slow. + +7. **State file** — saves pod ID, URL, model, and timestamp to `~/.config/runpod-session/state.json` for use by `--status` and `--stop`. + +## opencode provider setup + +Your `~/.config/opencode/opencode.jsonc` needs a `runpod` provider block before running the script. The script will fill in the `baseURL` on each session start: + +```jsonc +{ + "provider": { + "runpod": { + "options": { + "baseURL": "" // filled in automatically by runpod-session.sh + }, + "models": {} + } + }, + "model": "runpod/qwen3-coder:latest" +} +``` + +## Files + +| Path | Purpose | +|------|---------| +| `~/.config/runpod-session/config` | Main config (sourced as bash) | +| `~/.config/runpod-session/state.json` | Last session record | +| `~/.config/opencode/opencode.jsonc` | Patched on each session start | +| `~/.config/opencode/opencode.jsonc.bak` | Backup written before each patch | + +## Cost notes + +- **Running pod**: billed at the GPU hourly rate shown during selection. +- **Stopped pod**: network volume storage continues at ~$0.002/hr — terminate the pod if you no longer need it. +- `--stop` stops (does not terminate) the pod so it can be quickly restarted without losing the volume. diff --git a/runpod-session.sh b/runpod-session.sh new file mode 100755 index 0000000..c81e8dd --- /dev/null +++ b/runpod-session.sh @@ -0,0 +1,608 @@ +#!/usr/bin/env bash +# runpod-session.sh — Manage RunPod Ollama sessions for opencode +# +# Usage: +# runpod-session.sh [OPTIONS] +# +# Options: +# --model MODEL Ollama model tag to warm up (e.g. qwen3-coder:latest) +# Defaults to DEFAULT_MODEL in config +# --all-models Warm up ALL models listed in WARMUP_MODELS config +# --gpu-type 'NAME' Preferred GPU display name (partial match, case-insensitive) +# --max-price PRICE Max $/hr ceiling (default: MAX_PRICE_PER_HR in config) +# --new Force creation of a new pod (skip restart logic) +# --stop Stop the current running pod +# --status Show current session state and reachability +# --help Show this help +# +# Requires: curl, jq +# Config: ~/.config/runpod-session/config (auto-created on first run) + +set -euo pipefail + +# ─── Paths ──────────────────────────────────────────────────────────────────── +OPENCODE_CONFIG="$HOME/.config/opencode/opencode.jsonc" +SESSION_CONFIG_DIR="$HOME/.config/runpod-session" +SESSION_CONFIG="$SESSION_CONFIG_DIR/config" +SESSION_STATE="$SESSION_CONFIG_DIR/state.json" +RUNPOD_API="https://api.runpod.io/graphql" + +# ─── Defaults (overridden by config file) ───────────────────────────────────── +OLLAMA_IMAGE="ollama/ollama:latest" +NETWORK_VOLUME_NAME="danixland-storage" +OPENCODE_PROVIDER="runpod" +DEFAULT_MODEL="qwen3-coder:latest" +WARMUP_MODELS="qwen3-coder:latest translategemma:27b" +MAX_PRICE_PER_HR=2.50 +CONTAINER_DISK_GB=15 +DEFAULT_GPU_TYPE="" +GPU_COUNT=1 +POLL_INTERVAL=5 +STARTUP_TIMEOUT=240 +WARMUP_NUM_CTX=32768 + +# ─── Colors ─────────────────────────────────────────────────────────────────── +RED='\033[0;31m'; YELLOW='\033[1;33m'; GREEN='\033[0;32m' +CYAN='\033[0;36m'; BOLD='\033[1m'; RESET='\033[0m' + +log() { echo -e "${CYAN}[runpod]${RESET} $*" >&2; } +ok() { echo -e "${GREEN}[ok]${RESET} $*" >&2; } +warn() { echo -e "${YELLOW}[warn]${RESET} $*" >&2; } +die() { echo -e "${RED}[error]${RESET} $*" >&2; exit 1; } + +# ─── Dependency check ───────────────────────────────────────────────────────── +for cmd in curl jq; do + command -v "$cmd" &>/dev/null || die "Required command not found: $cmd" +done + +# ─── Config bootstrap ───────────────────────────────────────────────────────── +mkdir -p "$SESSION_CONFIG_DIR" + +if [[ ! -f "$SESSION_CONFIG" ]]; then + warn "No config found — creating $SESSION_CONFIG" + cat > "$SESSION_CONFIG" <<'CONF' +# runpod-session configuration — edit then re-run. + +RUNPOD_API_KEY="" + +# Network volume name as shown in RunPod dashboard +NETWORK_VOLUME_NAME="danixland-storage" + +# Must match the key in your opencode.json "provider" block +OPENCODE_PROVIDER="runpod" + +# Model to activate by default (used when --model is not passed) +DEFAULT_MODEL="qwen3-coder:latest" + +# All models that live on this pod — space-separated Ollama tags. +# These get registered in opencode.json and are warmed up with --all-models. +WARMUP_MODELS="qwen3-coder:latest translategemma:27b" + +# GPU selection +DEFAULT_GPU_TYPE="" # e.g. "RTX PRO 6000" — empty = cheapest available +MAX_PRICE_PER_HR=2.50 # hard $/hr ceiling + +# Pod configuration +CONTAINER_DISK_GB=15 +GPU_COUNT=1 +STARTUP_TIMEOUT=240 # seconds before giving up waiting for Ollama +WARMUP_NUM_CTX=32768 # num_ctx used when warming up models into VRAM +CONF + echo "" + echo -e "${YELLOW}Edit ${BOLD}$SESSION_CONFIG${RESET}${YELLOW}, set RUNPOD_API_KEY, then re-run.${RESET}" + exit 0 +fi + +# shellcheck source=/dev/null +source "$SESSION_CONFIG" +[[ -z "${RUNPOD_API_KEY:-}" ]] && die "RUNPOD_API_KEY not set in $SESSION_CONFIG" + +# ─── Argument parsing ───────────────────────────────────────────────────────── +OPT_MODEL="${DEFAULT_MODEL}" +OPT_ALL_MODELS=0 +OPT_GPU_TYPE="${DEFAULT_GPU_TYPE:-}" +OPT_MAX_PRICE="${MAX_PRICE_PER_HR}" +OPT_FORCE_NEW=0 +OPT_STOP=0 +OPT_STATUS=0 + +while [[ $# -gt 0 ]]; do + case "$1" in + --model) OPT_MODEL="$2"; shift 2 ;; + --all-models) OPT_ALL_MODELS=1; shift ;; + --gpu-type) OPT_GPU_TYPE="$2"; shift 2 ;; + --max-price) OPT_MAX_PRICE="$2"; shift 2 ;; + --new) OPT_FORCE_NEW=1; shift ;; + --stop) OPT_STOP=1; shift ;; + --status) OPT_STATUS=1; shift ;; + --help|-h) sed -n '2,17p' "$0"; exit 0 ;; + *) die "Unknown option: $1 (use --help)" ;; + esac +done + +# ─── RunPod GraphQL helper ──────────────────────────────────────────────────── +gql() { + local err + local response + response=$(curl --ipv4 -s \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $RUNPOD_API_KEY" \ + -d "$1" \ + "$RUNPOD_API") + if [[ $? -ne 0 || -z "$response" ]]; then + die "RunPod API request failed — check API key and connectivity" + fi + err=$(echo "$response" | jq -r '.errors[0].message // empty' 2>/dev/null || true) + [[ -n "$err" ]] && die "RunPod API error: $err" + echo "$response" +} + +# ─── Pod queries ────────────────────────────────────────────────────────────── +get_pods() { + gql '{"query":"{ myself { pods { id name desiredStatus costPerHr runtime { uptimeInSeconds } machine { gpuDisplayName } } } }"}' +} + +find_ollama_pod() { + # Returns compact JSON of the first pod matching "ollama" in its name, or empty string + local result + result=$(echo "$1" | jq -c \ + '.data.myself.pods[] | select(.name | test("ollama"; "i"))' 2>/dev/null | head -1) + echo "$result" +} + +# ─── --status subcommand ────────────────────────────────────────────────────── +cmd_status() { + echo "" + echo -e "${BOLD}runpod-session status${RESET}" + echo "" + + # ── Live pod data from API ──────────────────────────────────────────────── + local pods_json pod_json + pods_json=$(get_pods) + pod_json=$(find_ollama_pod "$pods_json") + + if [[ -n "$pod_json" ]]; then + local pod_id status gpu cost_hr uptime_sec cost_so_far + pod_id=$(echo "$pod_json" | jq -r '.id') + status=$(echo "$pod_json" | jq -r '.desiredStatus') + gpu=$(echo "$pod_json" | jq -r '.machine.gpuDisplayName // "?"') + cost_hr=$(echo "$pod_json" | jq -r '.costPerHr // 0') + uptime_sec=$(echo "$pod_json" | jq -r '.runtime.uptimeInSeconds // 0') + + # Calculate cost: (uptime_seconds / 3600) * cost_per_hr + cost_so_far=$(echo "$uptime_sec $cost_hr" | awk '{printf "%.4f", ($1/3600)*$2}') + uptime_human=$(echo "$uptime_sec" | awk '{ + h=int($1/3600); m=int(($1%3600)/60); s=$1%60 + if (h>0) printf "%dh %dm %ds", h, m, s + else if (m>0) printf "%dm %ds", m, s + else printf "%ds", s + }') + + echo -e " Pod: ${BOLD}${pod_id}${RESET}" + echo -e " GPU: ${gpu}" + echo -e " Status: ${status}" + echo -e " Rate: \$${cost_hr}/hr" + if [[ "$status" == "RUNNING" ]]; then + echo -e " Uptime: ${uptime_human}" + echo -e " Cost: \$${cost_so_far} this session" + else + echo -e " Uptime: —" + echo -e " Cost: —" + fi + echo "" + + # ── Ollama reachability ─────────────────────────────────────────────── + local base="https://${pod_id}-11434.proxy.runpod.net" + local tags_response + tags_response=$(curl -s --ipv4 --max-time 5 "${base}/api/tags" 2>/dev/null || true) + if echo "$tags_response" | jq -e '.models' > /dev/null 2>&1; then + echo -e " Ollama: ${GREEN}reachable${RESET}" + local models + models=$(echo "$tags_response" | jq -r '.models[].name' 2>/dev/null || true) + if [[ -n "$models" ]]; then + echo -e " In VRAM: $(echo "$models" | tr '\n' ' ')" + else + echo -e " In VRAM: none (model will load on first request)" + fi + else + echo -e " Ollama: ${YELLOW}not reachable${RESET}" + fi + else + echo -e " ${YELLOW}No Ollama pod found in your account.${RESET}" + fi + + # ── Saved session state ─────────────────────────────────────────────────── + if [[ -f "$SESSION_STATE" ]]; then + echo "" + echo -e " ${BOLD}Last session record:${RESET}" + jq -r '" Started: \(.started_at)\n Model: \(.model)\n URL: \(.ollama_url)"' \ + "$SESSION_STATE" + fi + + echo "" +} + +# ─── Pod lifecycle ──────────────────────────────────────────────────────────── +restart_pod() { + log "Restarting pod $1 ..." + gql "{\"query\":\"mutation { podResume(input: { podId: \\\"$1\\\", gpuCount: $GPU_COUNT }) { id desiredStatus } }\"}" > /dev/null +} + +stop_pod() { + log "Stopping pod $1 ..." + gql "{\"query\":\"mutation { podStop(input: { podId: \\\"$1\\\" }) { id desiredStatus } }\"}" > /dev/null + ok "Pod stopped. Storage costs continue while stopped (~\$0.002/hr)." +} + +terminate_pod() { + log "Terminating pod $1 ..." + gql "{\"query\":\"mutation { podTerminate(input: { podId: \\\"$1\\\" }) }\"}" > /dev/null + ok "Pod terminated." +} + +# ─── --stop subcommand ──────────────────────────────────────────────────────── +cmd_stop() { + local pod_id="" + if [[ -f "$SESSION_STATE" ]]; then + pod_id=$(jq -r '.pod_id' "$SESSION_STATE") + else + local pods_json pod_json + pods_json=$(get_pods) + pod_json=$(find_ollama_pod "$pods_json") + [[ -z "$pod_json" ]] && die "No Ollama pod found in your account." + pod_id=$(echo "$pod_json" | jq -r '.id') + fi + echo -n " Stop pod $pod_id? [y/N] " + read -r confirm + [[ "${confirm,,}" != "y" ]] && { log "Aborted."; exit 0; } + stop_pod "$pod_id" + rm -f "$SESSION_STATE" +} + +# ─── GPU selection ──────────────────────────────────────────────────────────── +# Returns a JSON array of candidates sorted by price (preferred first if set). +# Caller iterates and tries each until one succeeds. +get_gpu_candidates() { + local preferred="$1" max_price="$2" + log "Querying available GPUs (secure cloud, max \$$max_price/hr) ..." >&2 + + local result candidates + result=$(curl -s --ipv4 \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $RUNPOD_API_KEY" \ + -d '{"query":"{ gpuTypes { id displayName memoryInGb secureCloud lowestPrice(input: { gpuCount: 1 }) { uninterruptablePrice } } }"}' \ + "$RUNPOD_API") + + candidates=$(echo "$result" | jq -c \ + --arg max "$max_price" \ + '[.data.gpuTypes[] + | select(.secureCloud == true) + | select((.lowestPrice.uninterruptablePrice // 0) > 0) + | select(.lowestPrice.uninterruptablePrice <= ($max | tonumber)) + | { id: .id, name: .displayName, vram: .memoryInGb, + price: .lowestPrice.uninterruptablePrice } + ] | sort_by(.price)') + + [[ -z "$candidates" || "$candidates" == "[]" ]] && \ + die "No GPUs on secure cloud within \$$max_price/hr. Try --max-price." + + # Bubble preferred GPU to front so it's tried first + if [[ -n "$preferred" ]]; then + local reordered + reordered=$(echo "$candidates" | jq -c \ + --arg p "$preferred" \ + '( [.[] | select(.name | test($p; "i"))] ) + + ( [.[] | select(.name | test($p; "i") | not)] )') + local pcount + pcount=$(echo "$reordered" | jq 'map(select(.name | test($p; "i"))) | length' --arg p "$preferred" 2>/dev/null || echo 0) + if [[ "$pcount" -eq 0 ]]; then + warn "Preferred GPU '$preferred' not in catalog within price limit. Will try all." >&2 + fi + candidates="$reordered" + fi + + echo "$candidates" +} + +# ─── Network volume ─────────────────────────────────────────────────────────── +get_network_volume_id() { + local result vol_id + result=$(curl -s --ipv4 \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $RUNPOD_API_KEY" \ + -d '{"query":"{ myself { networkVolumes { id name } } }"}' \ + "$RUNPOD_API") + vol_id=$(echo "$result" | jq -r \ + --arg n "$NETWORK_VOLUME_NAME" \ + '.data.myself.networkVolumes[] | select(.name == $n) | .id') + [[ -z "$vol_id" ]] && die "Network volume '$NETWORK_VOLUME_NAME' not found." + echo "$vol_id" +} + +# ─── Create pod ─────────────────────────────────────────────────────────────── +create_pod() { + local gpu_id="$1" vol_id="$2" + log "Creating pod ..." + + # Build env array — omit OLLAMA_MODELS entry if path is unset + local env_json + env_json='[{"key":"OLLAMA_HOST","value":"0.0.0.0"},{"key":"OLLAMA_LOAD_TIMEOUT","value":"10m"},{"key":"OLLAMA_KEEP_ALIVE","value":"1h"}]' + if [[ -n "${OLLAMA_MODELS_PATH:-}" ]]; then + env_json=$(echo "$env_json" | jq \ + --arg v "$OLLAMA_MODELS_PATH" \ + '. + [{"key":"OLLAMA_MODELS","value":$v}]') + fi + + local payload + payload=$(jq -n \ + --arg gpu_id "$gpu_id" \ + --arg vol_id "$vol_id" \ + --arg image "$OLLAMA_IMAGE" \ + --argjson gpu_count "$GPU_COUNT" \ + --argjson disk "$CONTAINER_DISK_GB" \ + --argjson env "$env_json" \ + '{query: "mutation($input: PodFindAndDeployOnDemandInput!) { podFindAndDeployOnDemand(input: $input) { id } }", + variables: { input: { + cloudType: "SECURE", + gpuCount: $gpu_count, + volumeInGb: 0, + containerDiskInGb: $disk, + minVcpuCount: 4, + minMemoryInGb: 15, + gpuTypeId: $gpu_id, + name: "ollama-session", + imageName: $image, + ports: "11434/http", + volumeMountPath: "/workspace", + networkVolumeId: $vol_id, + env: $env + }}}') + + local result pod_id err_code + result=$(curl -s --ipv4 \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $RUNPOD_API_KEY" \ + -d "$payload" \ + "$RUNPOD_API") + + pod_id=$(echo "$result" | jq -r '.data.podFindAndDeployOnDemand.id // empty') + if [[ -n "$pod_id" ]]; then + echo "$pod_id" + return 0 + fi + + err_code=$(echo "$result" | jq -r '.errors[0].extensions.code // empty') + if [[ "$err_code" == "SUPPLY_CONSTRAINT" ]]; then + warn "No supply for GPU $gpu_id — trying next candidate ..." >&2 + return 1 + fi + die "Pod creation failed. Response: $result" +} + +# ─── Wait for Ollama ────────────────────────────────────────────────────────── +wait_for_pod() { + local pod_id="$1" + local url="https://${pod_id}-11434.proxy.runpod.net" + local elapsed=0 + log "Polling $url/api/tags (timeout: ${STARTUP_TIMEOUT}s) ..." + while (( elapsed < STARTUP_TIMEOUT )); do + local response + response=$(curl -s --ipv4 --max-time 5 "${url}/api/tags" 2>/dev/null || true) + if echo "$response" | jq -e '.models' > /dev/null 2>&1; then + echo ""; ok "Ollama is up."; return 0 + fi + printf " [%3ds] waiting...\r" "$elapsed" + sleep "$POLL_INTERVAL" + (( elapsed += POLL_INTERVAL )) + done + echo "" + die "Timed out after ${STARTUP_TIMEOUT}s. Check the RunPod dashboard." +} + +# ─── Patch opencode.json ────────────────────────────────────────────────────── +# +# What this changes in your opencode.json: +# .provider.runpod.options.baseURL → https://<pod-id>-11434.proxy.runpod.net/v1 +# .model → runpod/qwen3-coder:latest (DEFAULT_MODEL) +# .provider.runpod.models → merges all WARMUP_MODELS in (preserving +# any existing per-model config you have) +# +# Uses jq's `*` (recursive merge) so your existing model overrides are never clobbered. +# A .bak backup is written before any changes. +# +patch_opencode_config() { + local new_url="$1" # full URL including /v1 + + [[ ! -f "$OPENCODE_CONFIG" ]] && die "opencode config not found at $OPENCODE_CONFIG" + cp "$OPENCODE_CONFIG" "${OPENCODE_CONFIG}.bak" + + # Build a jq object for all warmup models: { "model:tag": {"tools":true}, ... } + # tools:true is a safe default — it won't override existing per-model settings + # because we merge with * where existing config wins on conflicts. + local models_patch="{}" + for m in $WARMUP_MODELS; do + models_patch=$(printf '%s' "$models_patch" \ + | jq --arg m "$m" '. + {($m): {"tools": true}}') + done + + local tmp + tmp=$(mktemp) + jq \ + --arg provider "$OPENCODE_PROVIDER" \ + --arg url "$new_url" \ + --arg model "${OPENCODE_PROVIDER}/${DEFAULT_MODEL}" \ + --argjson patch "$models_patch" \ + ' + .provider[$provider].options.baseURL = $url + | .model = $model + | .provider[$provider].models = ( + $patch * (.provider[$provider].models // {}) + ) + ' "$OPENCODE_CONFIG" > "$tmp" && mv "$tmp" "$OPENCODE_CONFIG" + + ok "opencode.json patched:" + log " provider.${OPENCODE_PROVIDER}.options.baseURL = $new_url" + log " model = ${OPENCODE_PROVIDER}/${DEFAULT_MODEL}" +} + +# ─── Warm up one model ──────────────────────────────────────────────────────── +warmup_model() { + local pod_id="$1" model="$2" + local base="https://${pod_id}-11434.proxy.runpod.net" + + log "Warming up '$model' into VRAM ..." + + if curl -s --ipv4 --max-time 300 -X POST "${base}/api/generate" \ + -H "Content-Type: application/json" \ + -d "{\"model\": \"$model\", \"prompt\": \"hi\", \"stream\": false, \"options\": {\"num_ctx\": ${WARMUP_NUM_CTX:-32768}}}" \ + > /dev/null 2>&1; then + ok " '$model' is loaded." + else + warn " Warmup for '$model' failed — model will load on first use." + fi +} + +# ─── State persistence ──────────────────────────────────────────────────────── +save_state() { + jq -n \ + --arg pod_id "$1" \ + --arg url "$2" \ + --arg model "$3" \ + --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \ + '{ pod_id: $pod_id, ollama_url: $url, model: $model, started_at: $ts }' \ + > "$SESSION_STATE" +} + +# ─── Main ───────────────────────────────────────────────────────────────────── +main() { + [[ $OPT_STATUS -eq 1 ]] && { cmd_status; exit 0; } + [[ $OPT_STOP -eq 1 ]] && { cmd_stop; exit 0; } + + echo -e "${BOLD}runpod-session${RESET} — Ollama on RunPod → opencode" + echo "" + + local pod_id="" + local _skip_wait=0 + + # ── 1. Existing pod check ───────────────────────────────────────────────── + if [[ $OPT_FORCE_NEW -eq 0 ]]; then + log "Checking for existing Ollama pods ..." + local pods_json pod_json + pods_json=$(get_pods) + pod_json=$(find_ollama_pod "$pods_json") + + if [[ -n "$pod_json" ]]; then + pod_id=$(echo "$pod_json" | jq -r '.id') + local status gpu cost + status=$(echo "$pod_json" | jq -r '.desiredStatus') + gpu=$(echo "$pod_json" | jq -r '.machine.gpuDisplayName // "?"') + cost=$(echo "$pod_json" | jq -r '.costPerHr // "?"') + + echo -e " Found: ${BOLD}${pod_id}${RESET} GPU: ${gpu} \$${cost}/hr Status: ${status}" + echo "" + + case "$status" in + RUNNING) + local _check + _check=$(curl -s --ipv4 --max-time 5 "https://${pod_id}-11434.proxy.runpod.net/api/tags" 2>/dev/null || true) + if echo "$_check" | jq -e '.models' > /dev/null 2>&1; then + ok "Already running and reachable — skipping startup sequence." + _skip_wait=1 + else + log "Pod is running but Ollama not yet reachable — waiting ..." + fi + ;; + EXITED|STOPPED) + echo -n " [R]estart [D]elete and create new [A]bort [R/d/a]: " + read -r choice + case "${choice,,}" in + d) terminate_pod "$pod_id"; pod_id="" ;; + a) log "Aborted."; exit 0 ;; + *) restart_pod "$pod_id" ;; + esac + ;; + *) + warn "Unexpected pod state '$status' — ignoring this pod." + pod_id="" + ;; + esac + else + log "No existing Ollama pod found." + fi + fi + + # ── 2. Create new pod if needed ─────────────────────────────────────────── + if [[ -z "$pod_id" ]]; then + local vol_id candidates gpu_json gpu_id gpu_name gpu_vram gpu_price + vol_id=$(get_network_volume_id) + ok "Volume: ${NETWORK_VOLUME_NAME} ($vol_id)" + + candidates=$(get_gpu_candidates "$OPT_GPU_TYPE" "$OPT_MAX_PRICE") + local count + count=$(echo "$candidates" | jq 'length') + [[ "$count" -eq 0 ]] && die "No GPU candidates found." + + log "${count} GPU options within budget. Will prompt for each." + + local i=0 + while [[ $i -lt $count ]]; do + gpu_json=$(echo "$candidates" | jq -c --argjson i "$i" '.[$i]') + gpu_id=$(echo "$gpu_json" | jq -r '.id') + gpu_name=$(echo "$gpu_json" | jq -r '.name') + gpu_vram=$(echo "$gpu_json" | jq -r '.vram') + gpu_price=$(echo "$gpu_json" | jq -r '.price') + + echo "" + echo -e " ${BOLD}${gpu_name}${RESET} ${gpu_vram}GB VRAM \$${gpu_price}/hr" + echo -n " Create pod with this GPU? [Y/n/a(bort)] " + read -r choice + case "${choice,,}" in + a) log "Aborted."; exit 0 ;; + n) (( i++ )) || true; continue ;; + esac + + if pod_id=$(create_pod "$gpu_id" "$vol_id"); then + ok "Pod created: $pod_id GPU: ${gpu_name}" + break + fi + (( i++ )) || true + done + + [[ -z "$pod_id" ]] && die "All ${count} GPU candidates exhausted. Try --max-price or later." + fi + + # ── 3. Wait for Ollama if needed ───────────────────────────────────────── + if [[ "${_skip_wait:-0}" != "1" ]]; then + wait_for_pod "$pod_id" + fi + + # ── 4. Final URL ────────────────────────────────────────────────────────── + local ollama_url="https://${pod_id}-11434.proxy.runpod.net/v1" + + # ── 5. Patch opencode.json ──────────────────────────────────────────────── + patch_opencode_config "$ollama_url" + + # ── 6. Warmup ───────────────────────────────────────────────────────────── + if [[ $OPT_ALL_MODELS -eq 1 ]]; then + for m in $WARMUP_MODELS; do + warmup_model "$pod_id" "$m" + done + elif [[ -n "$OPT_MODEL" ]]; then + warmup_model "$pod_id" "$OPT_MODEL" + else + warn "No warmup requested. Use --model MODEL or --all-models." + fi + + # ── 7. Save state ───────────────────────────────────────────────────────── + save_state "$pod_id" "$ollama_url" "$OPT_MODEL" + + # ── 8. Done ─────────────────────────────────────────────────────────────── + echo "" + echo -e "${BOLD}${GREEN}Ready.${RESET}" + printf " Pod: %s\n" "$pod_id" + printf " URL: %s\n" "$ollama_url" + printf " Run: opencode\n" + echo "" +} + +main "$@" |
