Replace default_model with force_model (model lock)

Removes DEFAULT_MODEL in favour of a force_model setting configurable
via the admin UI. When set, every proxy request's model field is
overridden, preventing uncoordinated model switches during lab sessions.
Updates schemas, admin API, all three proxy endpoints, frontend,
init_db, and docs (README, DOCKERHUB, KURZANLEITUNG).
This commit is contained in:
Oliver Hofmann 2026-05-08 08:02:16 +02:00
parent cced65693c
commit 34b108f4df
9 changed files with 41 additions and 34 deletions

View File

@ -7,6 +7,7 @@ A lightweight reverse proxy for [Ollama](https://ollama.com) that manages API ke
- OpenAI-compatible endpoint (`/v1/chat/completions`, `/v1/models`) - OpenAI-compatible endpoint (`/v1/chat/completions`, `/v1/models`)
- API key management with daily and monthly token/request limits - API key management with daily and monthly token/request limits
- Web-based admin interface (port 8001) - Web-based admin interface (port 8001)
- Model lock: enforces a specific model for all requests (useful for courses and lab sessions)
- Streaming support (Server-Sent Events) - Streaming support (Server-Sent Events)
- Tool use / function calling passthrough - Tool use / function calling passthrough
- Rotating usage logs - Rotating usage logs
@ -27,7 +28,6 @@ All API endpoints require the `ADMIN_PASSWORD` — without a valid token, only t
|----------|---------|-------------| |----------|---------|-------------|
| `ADMIN_PASSWORD` | | **Required.** Password for the admin interface | | `ADMIN_PASSWORD` | | **Required.** Password for the admin interface |
| `OLLAMA_URL` | `http://localhost:11434` | URL of the Ollama server (without `/v1` suffix) | | `OLLAMA_URL` | `http://localhost:11434` | URL of the Ollama server (without `/v1` suffix) |
| `DEFAULT_MODEL` | `llama3` | Model used when the client does not specify one |
| `DATABASE_URL` | `sqlite:///./test.db` | Database connection string (SQLite or PostgreSQL) | | `DATABASE_URL` | `sqlite:///./test.db` | Database connection string (SQLite or PostgreSQL) |
| `PROXY_HOST` | `0.0.0.0` | Proxy bind address | | `PROXY_HOST` | `0.0.0.0` | Proxy bind address |
| `PROXY_PORT` | `8000` | Proxy port | | `PROXY_PORT` | `8000` | Proxy port |
@ -59,7 +59,6 @@ volumes:
```env ```env
ADMIN_PASSWORD=changeme ADMIN_PASSWORD=changeme
OLLAMA_URL=http://localhost:11434 OLLAMA_URL=http://localhost:11434
DEFAULT_MODEL=llama3
APP_TZ=Europe/Berlin APP_TZ=Europe/Berlin
``` ```
@ -78,7 +77,7 @@ services:
environment: environment:
ADMIN_PASSWORD: changeme ADMIN_PASSWORD: changeme
OLLAMA_URL: http://ollama:11434 OLLAMA_URL: http://ollama:11434
DEFAULT_MODEL: llama3
APP_TZ: Europe/Berlin APP_TZ: Europe/Berlin
volumes: volumes:
- llmproxy-data:/app/backend - llmproxy-data:/app/backend
@ -111,7 +110,7 @@ services:
environment: environment:
ADMIN_PASSWORD: changeme ADMIN_PASSWORD: changeme
OLLAMA_URL: http://ollama:11434 OLLAMA_URL: http://ollama:11434
DEFAULT_MODEL: llama3
APP_TZ: Europe/Berlin APP_TZ: Europe/Berlin
DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
depends_on: depends_on:

View File

@ -7,6 +7,7 @@ Ein schlanker Reverse-Proxy für [Ollama](https://ollama.com), der API-Keys mit
- OpenAI-kompatibler Endpunkt (`/v1/chat/completions`, `/v1/models`) - OpenAI-kompatibler Endpunkt (`/v1/chat/completions`, `/v1/models`)
- API-Key-Verwaltung mit tages- und monatlichen Token-/Request-Limits - API-Key-Verwaltung mit tages- und monatlichen Token-/Request-Limits
- Web-basierte Admin-Oberfläche (Port 8001) - Web-basierte Admin-Oberfläche (Port 8001)
- Modell-Lock: erzwingt ein bestimmtes Modell für alle Requests (nützlich für Praktika/Kurse)
- Streaming-Support (Server-Sent Events) - Streaming-Support (Server-Sent Events)
- Tool-Use / Function Calling wird durchgereicht - Tool-Use / Function Calling wird durchgereicht
- Rotierende Nutzungs-Logs - Rotierende Nutzungs-Logs
@ -27,7 +28,6 @@ Alle API-Endpunkte erfordern das `ADMIN_PASSWORD` — ein Zugriff ohne gültiges
|----------|----------|--------------| |----------|----------|--------------|
| `ADMIN_PASSWORD` | | **Pflicht.** Passwort für die Admin-Oberfläche | | `ADMIN_PASSWORD` | | **Pflicht.** Passwort für die Admin-Oberfläche |
| `OLLAMA_URL` | `http://localhost:11434` | URL des Ollama-Servers (ohne `/v1`-Suffix) | | `OLLAMA_URL` | `http://localhost:11434` | URL des Ollama-Servers (ohne `/v1`-Suffix) |
| `DEFAULT_MODEL` | `llama3` | Modell, das verwendet wird wenn der Client keines angibt |
| `DATABASE_URL` | `sqlite:///./test.db` | Datenbank-Verbindungsstring (SQLite oder PostgreSQL) | | `DATABASE_URL` | `sqlite:///./test.db` | Datenbank-Verbindungsstring (SQLite oder PostgreSQL) |
| `PROXY_HOST` | `0.0.0.0` | Bind-Adresse des Proxy | | `PROXY_HOST` | `0.0.0.0` | Bind-Adresse des Proxy |
| `PROXY_PORT` | `8000` | Port des Proxy | | `PROXY_PORT` | `8000` | Port des Proxy |
@ -59,7 +59,6 @@ volumes:
```env ```env
ADMIN_PASSWORD=changeme ADMIN_PASSWORD=changeme
OLLAMA_URL=http://localhost:11434 OLLAMA_URL=http://localhost:11434
DEFAULT_MODEL=llama3
APP_TZ=Europe/Berlin APP_TZ=Europe/Berlin
``` ```
@ -78,7 +77,7 @@ services:
environment: environment:
ADMIN_PASSWORD: changeme ADMIN_PASSWORD: changeme
OLLAMA_URL: http://ollama:11434 OLLAMA_URL: http://ollama:11434
DEFAULT_MODEL: llama3
APP_TZ: Europe/Berlin APP_TZ: Europe/Berlin
volumes: volumes:
- llmproxy-data:/app/backend - llmproxy-data:/app/backend
@ -111,7 +110,7 @@ services:
environment: environment:
ADMIN_PASSWORD: changeme ADMIN_PASSWORD: changeme
OLLAMA_URL: http://ollama:11434 OLLAMA_URL: http://ollama:11434
DEFAULT_MODEL: llama3
APP_TZ: Europe/Berlin APP_TZ: Europe/Berlin
DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
depends_on: depends_on:

View File

@ -166,3 +166,12 @@ Das Web-Interface zur Verwaltung von API-Keys und Quotas ist erreichbar unter:
**`http://141.75.33.244:8001`** **`http://141.75.33.244:8001`**
Dort können API-Keys angelegt, deaktiviert und mit Quotas versehen werden. Dort können API-Keys angelegt, deaktiviert und mit Quotas versehen werden.
### Modell-Lock für Praktika
Unter **Einstellungen → Aktives Modell (Lock)** kann ein Modell fest vorgegeben werden. Ist ein Lock gesetzt, wird das `model`-Feld in jedem Request durch dieses Modell ersetzt unabhängig davon, was der Client schickt. Das verhindert unkoordinierte Modellwechsel während einer Veranstaltung, die alle Teilnehmenden durch lange Ladezeiten ausbremsen würden.
Typischer Ablauf für ein Praktikum:
1. Vor der Veranstaltung: passendes Modell in Ollama laden
2. Lock in der Admin-Oberfläche aktivieren
3. Nach der Veranstaltung: Lock wieder deaktivieren (Feld leeren)

View File

@ -33,7 +33,6 @@ ADMIN_HOST=0.0.0.0
ADMIN_PORT=8001 ADMIN_PORT=8001
DATABASE_URL=sqlite:///./test.db DATABASE_URL=sqlite:///./test.db
OLLAMA_URL=http://localhost:11434 OLLAMA_URL=http://localhost:11434
DEFAULT_MODEL=llama3
APP_TZ=Europe/Berlin APP_TZ=Europe/Berlin
LOG_FILE=logs/usage.log LOG_FILE=logs/usage.log
``` ```
@ -47,7 +46,6 @@ LOG_FILE=logs/usage.log
| `ADMIN_PORT` | `8001` | Port der Admin-API | | `ADMIN_PORT` | `8001` | Port der Admin-API |
| `DATABASE_URL` | `sqlite:///./test.db` | DB-Verbindungsstring (SQLite oder PostgreSQL) | | `DATABASE_URL` | `sqlite:///./test.db` | DB-Verbindungsstring (SQLite oder PostgreSQL) |
| `OLLAMA_URL` | `http://localhost:11434` | Adresse der Ollama-Instanz (auch in der UI änderbar) | | `OLLAMA_URL` | `http://localhost:11434` | Adresse der Ollama-Instanz (auch in der UI änderbar) |
| `DEFAULT_MODEL` | `llama3` | Standard-Modell für `/v1/chat/completions` (auch in der UI änderbar) |
| `APP_TZ` | `Europe/Berlin` | Zeitzone für tägliche/monatliche Quota-Resets | | `APP_TZ` | `Europe/Berlin` | Zeitzone für tägliche/monatliche Quota-Resets |
| `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei | | `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei |
| `ALLOWED_ORIGINS` | `http://localhost:5173` | CORS-Origins (nur für Entwicklung relevant) | | `ALLOWED_ORIGINS` | `http://localhost:5173` | CORS-Origins (nur für Entwicklung relevant) |

View File

@ -137,7 +137,7 @@ async def get_proxy_info(_ = Depends(require_admin_auth)):
async def read_settings(db: Session = Depends(get_db), _ = Depends(require_admin_auth)): async def read_settings(db: Session = Depends(get_db), _ = Depends(require_admin_auth)):
return schemas.Settings( return schemas.Settings(
ollama_url=crud.get_setting(db, "ollama_url", "http://localhost:11434"), ollama_url=crud.get_setting(db, "ollama_url", "http://localhost:11434"),
default_model=crud.get_setting(db, "default_model", "llama3"), force_model=crud.get_setting(db, "force_model") or None,
) )
@app.put("/api/settings", response_model=schemas.Settings) @app.put("/api/settings", response_model=schemas.Settings)
@ -148,8 +148,8 @@ async def update_settings(
): ):
ollama_url = settings.ollama_url.rstrip('/').removesuffix('/v1') ollama_url = settings.ollama_url.rstrip('/').removesuffix('/v1')
crud.set_setting(db, "ollama_url", ollama_url) crud.set_setting(db, "ollama_url", ollama_url)
crud.set_setting(db, "default_model", settings.default_model) crud.set_setting(db, "force_model", settings.force_model or "")
return schemas.Settings(ollama_url=ollama_url, default_model=settings.default_model) return schemas.Settings(ollama_url=ollama_url, force_model=settings.force_model or None)
@app.get("/api/ollama-models") @app.get("/api/ollama-models")
async def get_ollama_models( async def get_ollama_models(

View File

@ -13,8 +13,6 @@ def init_db():
db = SessionLocal() db = SessionLocal()
if not get_setting(db, "ollama_url"): if not get_setting(db, "ollama_url"):
set_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434")) set_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
if not get_setting(db, "default_model"):
set_setting(db, "default_model", os.getenv("DEFAULT_MODEL", "llama3"))
db.close() db.close()
print("Database initialized.") print("Database initialized.")

View File

@ -70,8 +70,6 @@ def apply_env_settings():
try: try:
if url := os.getenv("OLLAMA_URL"): if url := os.getenv("OLLAMA_URL"):
crud.set_setting(db, "ollama_url", url) crud.set_setting(db, "ollama_url", url)
if model := os.getenv("DEFAULT_MODEL"):
crud.set_setting(db, "default_model", model)
db.commit() db.commit()
finally: finally:
db.close() db.close()
@ -91,6 +89,9 @@ async def proxy_request(url: str, method: str = "GET", json_data: dict = None):
async def generate(request: Request, db: Session = Depends(get_db)): async def generate(request: Request, db: Session = Depends(get_db)):
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434")) ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
body = await request.json() body = await request.json()
force_model = crud.get_setting(db, "force_model") or None
if force_model:
body = {**body, "model": force_model}
prompt_tokens = crud.count_tokens(body.get("prompt", "")) prompt_tokens = crud.count_tokens(body.get("prompt", ""))
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1): if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
@ -115,6 +116,9 @@ async def generate(request: Request, db: Session = Depends(get_db)):
async def chat(request: Request, db: Session = Depends(get_db)): async def chat(request: Request, db: Session = Depends(get_db)):
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434")) ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
body = await request.json() body = await request.json()
force_model = crud.get_setting(db, "force_model") or None
if force_model:
body = {**body, "model": force_model}
messages = body.get("messages", []) messages = body.get("messages", [])
prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages) prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages)
@ -156,19 +160,19 @@ async def list_openai_models(db: Session = Depends(get_db)):
@app.post("/v1/chat/completions") @app.post("/v1/chat/completions")
async def openai_chat_completions(request: Request, db: Session = Depends(get_db)): async def openai_chat_completions(request: Request, db: Session = Depends(get_db)):
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434")) ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
default_model = crud.get_setting(db, "default_model", os.getenv("DEFAULT_MODEL", "llama3"))
body = await request.json() body = await request.json()
force_model = crud.get_setting(db, "force_model") or None
if force_model:
body = {**body, "model": force_model}
messages = body.get("messages", []) messages = body.get("messages", [])
prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages) prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages)
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1): if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
raise HTTPException(status_code=429, detail="Quota exceeded") raise HTTPException(status_code=429, detail="Quota exceeded")
if "model" not in body: model_name = body.get("model", "?")
body = {**body, "model": default_model}
model_name = body["model"]
usage_log.info('%s | /v1/chat/completions | %s | ~%d tokens | "%s"', usage_log.info('%s | /v1/chat/completions | %s | ~%d tokens | "%s"',
request.state.api_key_name, model_name, prompt_tokens, _last_user_msg(messages)) request.state.api_key_name, model_name, prompt_tokens, _last_user_msg(messages))

View File

@ -40,7 +40,7 @@ class QuotaUpdate(BaseModel):
class Settings(BaseModel): class Settings(BaseModel):
ollama_url: str ollama_url: str
default_model: str force_model: Optional[str] = None
class UsageStats(BaseModel): class UsageStats(BaseModel):
tokens_used_today: int = 0 tokens_used_today: int = 0

View File

@ -95,8 +95,8 @@ function SettingsSection({ password }) {
const { models, reachable } = res.data; const { models, reachable } = res.data;
setOllamaReachable(reachable); setOllamaReachable(reachable);
setAvailableModels(models); setAvailableModels(models);
if (models.length > 0 && !models.includes(currentModel)) { if (models.length > 0 && currentModel && !models.includes(currentModel)) {
setSettings(s => ({ ...s, default_model: models[0] })); setSettings(s => ({ ...s, force_model: models[0] }));
} }
} catch { } catch {
setOllamaReachable(false); setOllamaReachable(false);
@ -115,7 +115,7 @@ function SettingsSection({ password }) {
const s = settingsRes.data; const s = settingsRes.data;
setSettings(s); setSettings(s);
setProxyEndpoint(proxyRes.data.endpoint); setProxyEndpoint(proxyRes.data.endpoint);
fetchModels(s.ollama_url, s.default_model); fetchModels(s.ollama_url, s.force_model);
}).catch(() => setError('Einstellungen konnten nicht geladen werden.')); }).catch(() => setError('Einstellungen konnten nicht geladen werden.'));
}, []); }, []);
@ -152,7 +152,7 @@ function SettingsSection({ password }) {
type="url" type="url"
value={settings.ollama_url} value={settings.ollama_url}
onChange={(e) => setSettings({ ...settings, ollama_url: e.target.value })} onChange={(e) => setSettings({ ...settings, ollama_url: e.target.value })}
onBlur={(e) => fetchModels(e.target.value, settings.default_model)} onBlur={(e) => fetchModels(e.target.value, settings.force_model)}
placeholder="http://localhost:11434" placeholder="http://localhost:11434"
required required
/> />
@ -162,23 +162,23 @@ function SettingsSection({ password }) {
</div> </div>
</div> </div>
<div className="settings-row"> <div className="settings-row">
<label>Standard-Modell</label> <label>Aktives Modell (Lock)</label>
{modelsLoading ? ( {modelsLoading ? (
<span className="settings-value">Lade Modelle</span> <span className="settings-value">Lade Modelle</span>
) : availableModels.length > 0 ? ( ) : availableModels.length > 0 ? (
<select <select
value={settings.default_model} value={settings.force_model || ""}
onChange={(e) => setSettings({ ...settings, default_model: e.target.value })} onChange={(e) => setSettings({ ...settings, force_model: e.target.value || null })}
> >
<option value=""> kein Lock </option>
{availableModels.map(m => <option key={m} value={m}>{m}</option>)} {availableModels.map(m => <option key={m} value={m}>{m}</option>)}
</select> </select>
) : ( ) : (
<input <input
type="text" type="text"
value={settings.default_model} value={settings.force_model || ""}
onChange={(e) => setSettings({ ...settings, default_model: e.target.value })} onChange={(e) => setSettings({ ...settings, force_model: e.target.value || null })}
placeholder="llama3" placeholder="leer = kein Lock"
required
/> />
)} )}
</div> </div>