aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md154
1 files changed, 154 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..d12d1ec
--- /dev/null
+++ b/README.md
@@ -0,0 +1,154 @@
+# runpod-session.sh
+
+A single bash script that spins up (or resumes) a RunPod GPU pod running Ollama, waits for it to be reachable, warms up your models into VRAM, and patches your `opencode.jsonc` to point at the live pod — all in one command.
+
+## Requirements
+
+- `curl` and `jq`
+- A [RunPod](https://runpod.io) account with:
+ - An API key
+ - A network volume (for persistent model storage)
+- [opencode](https://opencode.ai) installed and configured with a `runpod` provider block
+
+## Installation
+
+```bash
+chmod +x runpod-session.sh
+# optionally symlink to somewhere on your PATH:
+ln -s "$(pwd)/runpod-session.sh" ~/.local/bin/runpod-session
+```
+
+On first run the script creates `~/.config/runpod-session/config` with defaults and exits — edit it, then re-run.
+
+## Configuration
+
+Config file: `~/.config/runpod-session/config`
+
+```bash
+# runpod-session configuration
+
+RUNPOD_API_KEY="rpa_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
+
+# Network volume name exactly as shown in RunPod dashboard
+NETWORK_VOLUME_NAME="my-storage"
+
+# Path where Ollama looks for models on the network volume
+OLLAMA_MODELS_PATH="/workspace/models"
+
+# Must match the key in your opencode.jsonc "provider" block
+OPENCODE_PROVIDER="runpod"
+
+# Model to activate by default (used when --model is not passed)
+DEFAULT_MODEL="qwen3-coder:latest"
+
+# All models stored on this pod — space-separated Ollama tags.
+# These get registered in opencode.jsonc and are warmed up with --all-models.
+WARMUP_MODELS="qwen3-coder:latest translategemma:27b"
+
+# Preferred GPU display name (partial match, case-insensitive). Empty = show all within budget.
+DEFAULT_GPU_TYPE="RTX PRO 6000"
+
+# Hard $/hr ceiling — only GPUs at or below this price are shown
+MAX_PRICE_PER_HR=2.50
+
+# Pod configuration
+CONTAINER_DISK_GB=15
+GPU_COUNT=1
+STARTUP_TIMEOUT=240 # seconds to wait for Ollama to become reachable
+WARMUP_NUM_CTX=32768 # context size used when pre-loading models into VRAM
+```
+
+## Usage
+
+```
+runpod-session.sh [OPTIONS]
+
+Options:
+ --model MODEL Ollama model tag to warm up (default: DEFAULT_MODEL in config)
+ --all-models Warm up ALL models listed in WARMUP_MODELS
+ --gpu-type 'NAME' Preferred GPU display name (partial match, case-insensitive)
+ --max-price PRICE Max $/hr ceiling (overrides config)
+ --new Force creation of a new pod (skip restart logic)
+ --stop Stop the current running pod
+ --status Show current session state and reachability
+ --help Show this help
+```
+
+### Typical workflow
+
+```bash
+# Start a session (warm up default model)
+runpod-session.sh
+
+# Start with a specific model
+runpod-session.sh --model qwen3-coder:latest
+
+# Warm up all configured models
+runpod-session.sh --all-models
+
+# Check what's running and how much it's costing
+runpod-session.sh --status
+
+# Stop the pod when done (pod is stopped, not terminated — network volume persists)
+runpod-session.sh --stop
+
+# Force a fresh pod (terminates any existing stopped pod, creates new)
+runpod-session.sh --new --gpu-type "RTX 4090" --max-price 1.80
+```
+
+## How it works
+
+1. **Existing pod check** — queries RunPod for any pod with "ollama" in its name.
+ - If running and reachable: skips straight to warmup.
+ - If stopped/exited: prompts to restart, delete, or abort.
+ - `--new` skips this check entirely.
+
+2. **GPU selection** — queries the RunPod GPU catalog for secure-cloud instances within your price ceiling, sorted by price. If `DEFAULT_GPU_TYPE` is set, matching GPUs are shown first. Each candidate is shown with its VRAM and price; you confirm, skip (`n`), or abort (`a`) for each one. If a GPU you confirm turns out to have no available machines (`SUPPLY_CONSTRAINT`), the script moves to the next candidate automatically.
+
+3. **Pod creation** — deploys `ollama/ollama:latest` on the chosen GPU with your network volume mounted at `/workspace`. Ollama is configured to listen on `0.0.0.0:11434` and keep models in VRAM for 1 hour after last use.
+
+4. **Startup wait** — polls `https://<pod-id>-11434.proxy.runpod.net/api/tags` every 5 seconds until Ollama responds (up to `STARTUP_TIMEOUT` seconds).
+
+5. **opencode.jsonc patch** — updates three fields in your opencode config:
+ - `provider.runpod.options.baseURL` → the live pod URL
+ - `model` → `runpod/<DEFAULT_MODEL>`
+ - `provider.runpod.models` → merges all `WARMUP_MODELS` in (existing per-model settings are preserved via jq recursive merge)
+
+ A `.bak` copy is written before any changes.
+
+6. **Model warmup** — sends a short generation request to load the model into VRAM at `WARMUP_NUM_CTX` context length, so the first real request isn't slow.
+
+7. **State file** — saves pod ID, URL, model, and timestamp to `~/.config/runpod-session/state.json` for use by `--status` and `--stop`.
+
+## opencode provider setup
+
+Your `~/.config/opencode/opencode.jsonc` needs a `runpod` provider block before running the script. The script will fill in the `baseURL` on each session start:
+
+```jsonc
+{
+ "provider": {
+ "runpod": {
+ "options": {
+ "baseURL": "" // filled in automatically by runpod-session.sh
+ },
+ "models": {}
+ }
+ },
+ "model": "runpod/qwen3-coder:latest"
+}
+```
+
+## Files
+
+| Path | Purpose |
+|------|---------|
+| `~/.config/runpod-session/config` | Main config (sourced as bash) |
+| `~/.config/runpod-session/state.json` | Last session record |
+| `~/.config/opencode/opencode.jsonc` | Patched on each session start |
+| `~/.config/opencode/opencode.jsonc.bak` | Backup written before each patch |
+
+## Cost notes
+
+- **Running pod**: billed at the GPU hourly rate shown during selection.
+- **Stopped pod**: network volume storage continues at ~$0.002/hr — terminate the pod if you no longer need it.
+- `--stop` stops (does not terminate) the pod so it can be quickly restarted without losing the volume.