aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
blob: 194ea00e9ebe69149049e03d942b0f25e2b62b86 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
# runpod-session.sh

A single bash script that spins up (or resumes) a RunPod GPU pod running Ollama, waits for it to be reachable, warms up your models into VRAM, and patches your `opencode.jsonc` to point at the live pod — all in one command.

## Requirements

- `curl` and `jq`
- A [RunPod](https://runpod.io) account with:
  - An API key
  - A network volume (for persistent model storage)
- [opencode](https://opencode.ai) installed and configured with a `runpod` provider block

## Installation

```bash
chmod +x runpod-session.sh
# optionally symlink to somewhere on your PATH:
ln -s "$(pwd)/runpod-session.sh" ~/.local/bin/runpod-session
```

On first run the script creates `~/.config/runpod-session/config` with defaults and exits — edit it, then re-run.

## Configuration

Config file: `~/.config/runpod-session/config`

```bash
# runpod-session configuration

RUNPOD_API_KEY="rpa_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Network volume name exactly as shown in RunPod dashboard
NETWORK_VOLUME_NAME="my-storage"

# Path where Ollama looks for models on the network volume
OLLAMA_MODELS_PATH="/workspace/models"

# Must match the key in your opencode.jsonc "provider" block
OPENCODE_PROVIDER="runpod"

# Model to activate by default (used when --model is not passed)
DEFAULT_MODEL="qwen3-coder:latest"

# All models stored on this pod — space-separated Ollama tags.
# These get registered in opencode.jsonc and are warmed up with --all-models.
WARMUP_MODELS="qwen3-coder:latest translategemma:27b"

# Preferred GPU display name (partial match, case-insensitive). Empty = show all within budget.
DEFAULT_GPU_TYPE="RTX PRO 6000"

# Hard $/hr ceiling — only GPUs at or below this price are shown
MAX_PRICE_PER_HR=2.50

# Pod configuration
CONTAINER_DISK_GB=15
GPU_COUNT=1
STARTUP_TIMEOUT=240     # seconds to wait for Ollama to become reachable
WARMUP_NUM_CTX=32768    # context size used when pre-loading models into VRAM

# External tool configs to patch with the live pod URL (leave empty to skip)
TRANSART_SCRIPT=""      # e.g. /home/user/bin/transart.py
PUBLISHER_CONFIG=""     # e.g. /home/user/.config/my-publisher/config.toml
```

## Usage

```
runpod-session.sh [OPTIONS]

Options:
  --model MODEL        Ollama model tag to warm up (default: DEFAULT_MODEL in config)
  --all-models         Warm up ALL models listed in WARMUP_MODELS
  --gpu-type 'NAME'    Preferred GPU display name (partial match, case-insensitive)
  --max-price PRICE    Max $/hr ceiling (overrides config)
  --new                Force creation of a new pod (skip restart logic)
  --stop               Stop the current running pod
  --status             Show current session state and reachability
  --help               Show this help
```

### Typical workflow

```bash
# Start a session (warm up default model)
runpod-session.sh

# Start with a specific model
runpod-session.sh --model qwen3-coder:latest

# Warm up all configured models
runpod-session.sh --all-models

# Check what's running and how much it's costing
runpod-session.sh --status

# Stop the pod when done (pod is stopped, not terminated — network volume persists)
runpod-session.sh --stop

# Force a fresh pod (terminates any existing stopped pod, creates new)
runpod-session.sh --new --gpu-type "RTX 4090" --max-price 1.80
```

## How it works

1. **Existing pod check** — queries RunPod for any pod with "ollama" in its name.
   - If running and reachable: skips straight to warmup.
   - If stopped/exited: prompts to restart, delete, or abort.
   - `--new` skips this check entirely.

2. **GPU selection** — queries the RunPod GPU catalog for secure-cloud instances within your price ceiling, sorted by price. If `DEFAULT_GPU_TYPE` is set, matching GPUs are shown first. Each candidate is shown with its VRAM and price; you confirm, skip (`n`), or abort (`a`) for each one. If a GPU you confirm turns out to have no available machines (`SUPPLY_CONSTRAINT`), the script moves to the next candidate automatically.

3. **Pod creation** — deploys `ollama/ollama:latest` on the chosen GPU with your network volume mounted at `/workspace`. Ollama is configured to listen on `0.0.0.0:11434` and keep models in VRAM for 1 hour after last use.

4. **Startup wait** — polls `https://<pod-id>-11434.proxy.runpod.net/api/tags` every 5 seconds until Ollama responds (up to `STARTUP_TIMEOUT` seconds).

5. **Config patching** — updates your opencode config and any external tools configured via `TRANSART_SCRIPT` / `PUBLISHER_CONFIG`:
   - `provider.runpod.options.baseURL` → the live pod URL
   - `model``runpod/<DEFAULT_MODEL>`
   - `provider.runpod.models` → merges all `WARMUP_MODELS` in (existing per-model settings are preserved via jq recursive merge)
   - `OLLAMA_HOST` in `transart.py` → bare pod URL (no `/v1`)
   - `ollama_host` in `my-publisher/config.toml` → bare pod URL (no `/v1`)

   A `.bak` copy is written before each file is modified. Entries left empty in config are skipped.

6. **Model warmup** — sends a short generation request to load the model into VRAM at `WARMUP_NUM_CTX` context length, so the first real request isn't slow.

7. **State file** — saves pod ID, URL, model, and timestamp to `~/.config/runpod-session/state.json` for use by `--status` and `--stop`.

## opencode provider setup

Your `~/.config/opencode/opencode.jsonc` needs a `runpod` provider block before running the script. The script will fill in the `baseURL` on each session start:

```jsonc
{
  "provider": {
    "runpod": {
      "options": {
        "baseURL": ""  // filled in automatically by runpod-session.sh
      },
      "models": {}
    }
  },
  "model": "runpod/qwen3-coder:latest"
}
```

## Files

| Path | Purpose |
|------|---------|
| `~/.config/runpod-session/config` | Main config (sourced as bash) |
| `~/.config/runpod-session/state.json` | Last session record |
| `~/.config/opencode/opencode.jsonc` | Patched on each session start |
| `~/.config/opencode/opencode.jsonc.bak` | Backup written before each patch |
| `$TRANSART_SCRIPT` | `OLLAMA_HOST` updated if set in config |
| `$PUBLISHER_CONFIG` | `ollama_host` updated if set in config |

## Cost notes

- **Running pod**: billed at the GPU hourly rate shown during selection.
- **Stopped pod**: network volume storage continues at ~$0.002/hr — terminate the pod if you no longer need it.
- `--stop` stops (does not terminate) the pod so it can be quickly restarted without losing the volume.