1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
|
# runpod-session.sh
A single bash script that spins up (or resumes) a RunPod GPU pod running Ollama, waits for it to be reachable, warms up your models into VRAM, and patches your `opencode.jsonc` to point at the live pod — all in one command.
## Requirements
- `curl` and `jq`
- A [RunPod](https://runpod.io) account with:
- An API key
- A network volume (for persistent model storage)
- [opencode](https://opencode.ai) installed and configured with a `runpod` provider block
## Installation
```bash
chmod +x runpod-session.sh
# optionally symlink to somewhere on your PATH:
ln -s "$(pwd)/runpod-session.sh" ~/.local/bin/runpod-session
```
On first run the script creates `~/.config/runpod-session/config` with defaults and exits — edit it, then re-run.
## Configuration
Config file: `~/.config/runpod-session/config`
```bash
# runpod-session configuration
RUNPOD_API_KEY="rpa_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Network volume name exactly as shown in RunPod dashboard
NETWORK_VOLUME_NAME="my-storage"
# Path where Ollama looks for models on the network volume
OLLAMA_MODELS_PATH="/workspace/models"
# Must match the key in your opencode.jsonc "provider" block
OPENCODE_PROVIDER="runpod"
# Model to activate by default (used when --model is not passed)
DEFAULT_MODEL="qwen3-coder:latest"
# All models stored on this pod — space-separated Ollama tags.
# These get registered in opencode.jsonc and are warmed up with --all-models.
WARMUP_MODELS="qwen3-coder:latest translategemma:27b"
# Preferred GPU display name (partial match, case-insensitive). Empty = show all within budget.
DEFAULT_GPU_TYPE="RTX PRO 6000"
# Hard $/hr ceiling — only GPUs at or below this price are shown
MAX_PRICE_PER_HR=2.50
# Pod configuration
CONTAINER_DISK_GB=15
GPU_COUNT=1
STARTUP_TIMEOUT=240 # seconds to wait for Ollama to become reachable
WARMUP_NUM_CTX=32768 # context size used when pre-loading models into VRAM
```
## Usage
```
runpod-session.sh [OPTIONS]
Options:
--model MODEL Ollama model tag to warm up (default: DEFAULT_MODEL in config)
--all-models Warm up ALL models listed in WARMUP_MODELS
--gpu-type 'NAME' Preferred GPU display name (partial match, case-insensitive)
--max-price PRICE Max $/hr ceiling (overrides config)
--new Force creation of a new pod (skip restart logic)
--stop Stop the current running pod
--status Show current session state and reachability
--help Show this help
```
### Typical workflow
```bash
# Start a session (warm up default model)
runpod-session.sh
# Start with a specific model
runpod-session.sh --model qwen3-coder:latest
# Warm up all configured models
runpod-session.sh --all-models
# Check what's running and how much it's costing
runpod-session.sh --status
# Stop the pod when done (pod is stopped, not terminated — network volume persists)
runpod-session.sh --stop
# Force a fresh pod (terminates any existing stopped pod, creates new)
runpod-session.sh --new --gpu-type "RTX 4090" --max-price 1.80
```
## How it works
1. **Existing pod check** — queries RunPod for any pod with "ollama" in its name.
- If running and reachable: skips straight to warmup.
- If stopped/exited: prompts to restart, delete, or abort.
- `--new` skips this check entirely.
2. **GPU selection** — queries the RunPod GPU catalog for secure-cloud instances within your price ceiling, sorted by price. If `DEFAULT_GPU_TYPE` is set, matching GPUs are shown first. Each candidate is shown with its VRAM and price; you confirm, skip (`n`), or abort (`a`) for each one. If a GPU you confirm turns out to have no available machines (`SUPPLY_CONSTRAINT`), the script moves to the next candidate automatically.
3. **Pod creation** — deploys `ollama/ollama:latest` on the chosen GPU with your network volume mounted at `/workspace`. Ollama is configured to listen on `0.0.0.0:11434` and keep models in VRAM for 1 hour after last use.
4. **Startup wait** — polls `https://<pod-id>-11434.proxy.runpod.net/api/tags` every 5 seconds until Ollama responds (up to `STARTUP_TIMEOUT` seconds).
5. **opencode.jsonc patch** — updates three fields in your opencode config:
- `provider.runpod.options.baseURL` → the live pod URL
- `model` → `runpod/<DEFAULT_MODEL>`
- `provider.runpod.models` → merges all `WARMUP_MODELS` in (existing per-model settings are preserved via jq recursive merge)
A `.bak` copy is written before any changes.
6. **Model warmup** — sends a short generation request to load the model into VRAM at `WARMUP_NUM_CTX` context length, so the first real request isn't slow.
7. **State file** — saves pod ID, URL, model, and timestamp to `~/.config/runpod-session/state.json` for use by `--status` and `--stop`.
## opencode provider setup
Your `~/.config/opencode/opencode.jsonc` needs a `runpod` provider block before running the script. The script will fill in the `baseURL` on each session start:
```jsonc
{
"provider": {
"runpod": {
"options": {
"baseURL": "" // filled in automatically by runpod-session.sh
},
"models": {}
}
},
"model": "runpod/qwen3-coder:latest"
}
```
## Files
| Path | Purpose |
|------|---------|
| `~/.config/runpod-session/config` | Main config (sourced as bash) |
| `~/.config/runpod-session/state.json` | Last session record |
| `~/.config/opencode/opencode.jsonc` | Patched on each session start |
| `~/.config/opencode/opencode.jsonc.bak` | Backup written before each patch |
## Cost notes
- **Running pod**: billed at the GPU hourly rate shown during selection.
- **Stopped pod**: network volume storage continues at ~$0.002/hr — terminate the pod if you no longer need it.
- `--stop` stops (does not terminate) the pod so it can be quickly restarted without losing the volume.
|