Replace default_model with force_model (model lock)
Removes DEFAULT_MODEL in favour of a force_model setting configurable via the admin UI. When set, every proxy request's model field is overridden, preventing uncoordinated model switches during lab sessions. Updates schemas, admin API, all three proxy endpoints, frontend, init_db, and docs (README, DOCKERHUB, KURZANLEITUNG).
This commit is contained in:
parent
cced65693c
commit
34b108f4df
@ -7,6 +7,7 @@ A lightweight reverse proxy for [Ollama](https://ollama.com) that manages API ke
|
||||
- OpenAI-compatible endpoint (`/v1/chat/completions`, `/v1/models`)
|
||||
- API key management with daily and monthly token/request limits
|
||||
- Web-based admin interface (port 8001)
|
||||
- Model lock: enforces a specific model for all requests (useful for courses and lab sessions)
|
||||
- Streaming support (Server-Sent Events)
|
||||
- Tool use / function calling passthrough
|
||||
- Rotating usage logs
|
||||
@ -27,7 +28,6 @@ All API endpoints require the `ADMIN_PASSWORD` — without a valid token, only t
|
||||
|----------|---------|-------------|
|
||||
| `ADMIN_PASSWORD` | – | **Required.** Password for the admin interface |
|
||||
| `OLLAMA_URL` | `http://localhost:11434` | URL of the Ollama server (without `/v1` suffix) |
|
||||
| `DEFAULT_MODEL` | `llama3` | Model used when the client does not specify one |
|
||||
| `DATABASE_URL` | `sqlite:///./test.db` | Database connection string (SQLite or PostgreSQL) |
|
||||
| `PROXY_HOST` | `0.0.0.0` | Proxy bind address |
|
||||
| `PROXY_PORT` | `8000` | Proxy port |
|
||||
@ -59,7 +59,6 @@ volumes:
|
||||
```env
|
||||
ADMIN_PASSWORD=changeme
|
||||
OLLAMA_URL=http://localhost:11434
|
||||
DEFAULT_MODEL=llama3
|
||||
APP_TZ=Europe/Berlin
|
||||
```
|
||||
|
||||
@ -78,7 +77,7 @@ services:
|
||||
environment:
|
||||
ADMIN_PASSWORD: changeme
|
||||
OLLAMA_URL: http://ollama:11434
|
||||
DEFAULT_MODEL: llama3
|
||||
|
||||
APP_TZ: Europe/Berlin
|
||||
volumes:
|
||||
- llmproxy-data:/app/backend
|
||||
@ -111,7 +110,7 @@ services:
|
||||
environment:
|
||||
ADMIN_PASSWORD: changeme
|
||||
OLLAMA_URL: http://ollama:11434
|
||||
DEFAULT_MODEL: llama3
|
||||
|
||||
APP_TZ: Europe/Berlin
|
||||
DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
|
||||
depends_on:
|
||||
|
||||
@ -7,6 +7,7 @@ Ein schlanker Reverse-Proxy für [Ollama](https://ollama.com), der API-Keys mit
|
||||
- OpenAI-kompatibler Endpunkt (`/v1/chat/completions`, `/v1/models`)
|
||||
- API-Key-Verwaltung mit tages- und monatlichen Token-/Request-Limits
|
||||
- Web-basierte Admin-Oberfläche (Port 8001)
|
||||
- Modell-Lock: erzwingt ein bestimmtes Modell für alle Requests (nützlich für Praktika/Kurse)
|
||||
- Streaming-Support (Server-Sent Events)
|
||||
- Tool-Use / Function Calling wird durchgereicht
|
||||
- Rotierende Nutzungs-Logs
|
||||
@ -27,7 +28,6 @@ Alle API-Endpunkte erfordern das `ADMIN_PASSWORD` — ein Zugriff ohne gültiges
|
||||
|----------|----------|--------------|
|
||||
| `ADMIN_PASSWORD` | – | **Pflicht.** Passwort für die Admin-Oberfläche |
|
||||
| `OLLAMA_URL` | `http://localhost:11434` | URL des Ollama-Servers (ohne `/v1`-Suffix) |
|
||||
| `DEFAULT_MODEL` | `llama3` | Modell, das verwendet wird wenn der Client keines angibt |
|
||||
| `DATABASE_URL` | `sqlite:///./test.db` | Datenbank-Verbindungsstring (SQLite oder PostgreSQL) |
|
||||
| `PROXY_HOST` | `0.0.0.0` | Bind-Adresse des Proxy |
|
||||
| `PROXY_PORT` | `8000` | Port des Proxy |
|
||||
@ -59,7 +59,6 @@ volumes:
|
||||
```env
|
||||
ADMIN_PASSWORD=changeme
|
||||
OLLAMA_URL=http://localhost:11434
|
||||
DEFAULT_MODEL=llama3
|
||||
APP_TZ=Europe/Berlin
|
||||
```
|
||||
|
||||
@ -78,7 +77,7 @@ services:
|
||||
environment:
|
||||
ADMIN_PASSWORD: changeme
|
||||
OLLAMA_URL: http://ollama:11434
|
||||
DEFAULT_MODEL: llama3
|
||||
|
||||
APP_TZ: Europe/Berlin
|
||||
volumes:
|
||||
- llmproxy-data:/app/backend
|
||||
@ -111,7 +110,7 @@ services:
|
||||
environment:
|
||||
ADMIN_PASSWORD: changeme
|
||||
OLLAMA_URL: http://ollama:11434
|
||||
DEFAULT_MODEL: llama3
|
||||
|
||||
APP_TZ: Europe/Berlin
|
||||
DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
|
||||
depends_on:
|
||||
|
||||
@ -166,3 +166,12 @@ Das Web-Interface zur Verwaltung von API-Keys und Quotas ist erreichbar unter:
|
||||
**`http://141.75.33.244:8001`**
|
||||
|
||||
Dort können API-Keys angelegt, deaktiviert und mit Quotas versehen werden.
|
||||
|
||||
### Modell-Lock für Praktika
|
||||
|
||||
Unter **Einstellungen → Aktives Modell (Lock)** kann ein Modell fest vorgegeben werden. Ist ein Lock gesetzt, wird das `model`-Feld in jedem Request durch dieses Modell ersetzt – unabhängig davon, was der Client schickt. Das verhindert unkoordinierte Modellwechsel während einer Veranstaltung, die alle Teilnehmenden durch lange Ladezeiten ausbremsen würden.
|
||||
|
||||
Typischer Ablauf für ein Praktikum:
|
||||
1. Vor der Veranstaltung: passendes Modell in Ollama laden
|
||||
2. Lock in der Admin-Oberfläche aktivieren
|
||||
3. Nach der Veranstaltung: Lock wieder deaktivieren (Feld leeren)
|
||||
|
||||
@ -33,7 +33,6 @@ ADMIN_HOST=0.0.0.0
|
||||
ADMIN_PORT=8001
|
||||
DATABASE_URL=sqlite:///./test.db
|
||||
OLLAMA_URL=http://localhost:11434
|
||||
DEFAULT_MODEL=llama3
|
||||
APP_TZ=Europe/Berlin
|
||||
LOG_FILE=logs/usage.log
|
||||
```
|
||||
@ -47,7 +46,6 @@ LOG_FILE=logs/usage.log
|
||||
| `ADMIN_PORT` | `8001` | Port der Admin-API |
|
||||
| `DATABASE_URL` | `sqlite:///./test.db` | DB-Verbindungsstring (SQLite oder PostgreSQL) |
|
||||
| `OLLAMA_URL` | `http://localhost:11434` | Adresse der Ollama-Instanz (auch in der UI änderbar) |
|
||||
| `DEFAULT_MODEL` | `llama3` | Standard-Modell für `/v1/chat/completions` (auch in der UI änderbar) |
|
||||
| `APP_TZ` | `Europe/Berlin` | Zeitzone für tägliche/monatliche Quota-Resets |
|
||||
| `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei |
|
||||
| `ALLOWED_ORIGINS` | `http://localhost:5173` | CORS-Origins (nur für Entwicklung relevant) |
|
||||
|
||||
@ -137,7 +137,7 @@ async def get_proxy_info(_ = Depends(require_admin_auth)):
|
||||
async def read_settings(db: Session = Depends(get_db), _ = Depends(require_admin_auth)):
|
||||
return schemas.Settings(
|
||||
ollama_url=crud.get_setting(db, "ollama_url", "http://localhost:11434"),
|
||||
default_model=crud.get_setting(db, "default_model", "llama3"),
|
||||
force_model=crud.get_setting(db, "force_model") or None,
|
||||
)
|
||||
|
||||
@app.put("/api/settings", response_model=schemas.Settings)
|
||||
@ -148,8 +148,8 @@ async def update_settings(
|
||||
):
|
||||
ollama_url = settings.ollama_url.rstrip('/').removesuffix('/v1')
|
||||
crud.set_setting(db, "ollama_url", ollama_url)
|
||||
crud.set_setting(db, "default_model", settings.default_model)
|
||||
return schemas.Settings(ollama_url=ollama_url, default_model=settings.default_model)
|
||||
crud.set_setting(db, "force_model", settings.force_model or "")
|
||||
return schemas.Settings(ollama_url=ollama_url, force_model=settings.force_model or None)
|
||||
|
||||
@app.get("/api/ollama-models")
|
||||
async def get_ollama_models(
|
||||
|
||||
@ -13,8 +13,6 @@ def init_db():
|
||||
db = SessionLocal()
|
||||
if not get_setting(db, "ollama_url"):
|
||||
set_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
||||
if not get_setting(db, "default_model"):
|
||||
set_setting(db, "default_model", os.getenv("DEFAULT_MODEL", "llama3"))
|
||||
db.close()
|
||||
|
||||
print("Database initialized.")
|
||||
|
||||
@ -70,8 +70,6 @@ def apply_env_settings():
|
||||
try:
|
||||
if url := os.getenv("OLLAMA_URL"):
|
||||
crud.set_setting(db, "ollama_url", url)
|
||||
if model := os.getenv("DEFAULT_MODEL"):
|
||||
crud.set_setting(db, "default_model", model)
|
||||
db.commit()
|
||||
finally:
|
||||
db.close()
|
||||
@ -91,6 +89,9 @@ async def proxy_request(url: str, method: str = "GET", json_data: dict = None):
|
||||
async def generate(request: Request, db: Session = Depends(get_db)):
|
||||
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
||||
body = await request.json()
|
||||
force_model = crud.get_setting(db, "force_model") or None
|
||||
if force_model:
|
||||
body = {**body, "model": force_model}
|
||||
prompt_tokens = crud.count_tokens(body.get("prompt", ""))
|
||||
|
||||
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
|
||||
@ -115,6 +116,9 @@ async def generate(request: Request, db: Session = Depends(get_db)):
|
||||
async def chat(request: Request, db: Session = Depends(get_db)):
|
||||
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
||||
body = await request.json()
|
||||
force_model = crud.get_setting(db, "force_model") or None
|
||||
if force_model:
|
||||
body = {**body, "model": force_model}
|
||||
messages = body.get("messages", [])
|
||||
prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages)
|
||||
|
||||
@ -156,19 +160,19 @@ async def list_openai_models(db: Session = Depends(get_db)):
|
||||
@app.post("/v1/chat/completions")
|
||||
async def openai_chat_completions(request: Request, db: Session = Depends(get_db)):
|
||||
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
||||
default_model = crud.get_setting(db, "default_model", os.getenv("DEFAULT_MODEL", "llama3"))
|
||||
|
||||
body = await request.json()
|
||||
force_model = crud.get_setting(db, "force_model") or None
|
||||
if force_model:
|
||||
body = {**body, "model": force_model}
|
||||
messages = body.get("messages", [])
|
||||
prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages)
|
||||
|
||||
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
|
||||
raise HTTPException(status_code=429, detail="Quota exceeded")
|
||||
|
||||
if "model" not in body:
|
||||
body = {**body, "model": default_model}
|
||||
model_name = body.get("model", "?")
|
||||
|
||||
model_name = body["model"]
|
||||
usage_log.info('%s | /v1/chat/completions | %s | ~%d tokens | "%s"',
|
||||
request.state.api_key_name, model_name, prompt_tokens, _last_user_msg(messages))
|
||||
|
||||
|
||||
@ -40,7 +40,7 @@ class QuotaUpdate(BaseModel):
|
||||
|
||||
class Settings(BaseModel):
|
||||
ollama_url: str
|
||||
default_model: str
|
||||
force_model: Optional[str] = None
|
||||
|
||||
class UsageStats(BaseModel):
|
||||
tokens_used_today: int = 0
|
||||
|
||||
@ -95,8 +95,8 @@ function SettingsSection({ password }) {
|
||||
const { models, reachable } = res.data;
|
||||
setOllamaReachable(reachable);
|
||||
setAvailableModels(models);
|
||||
if (models.length > 0 && !models.includes(currentModel)) {
|
||||
setSettings(s => ({ ...s, default_model: models[0] }));
|
||||
if (models.length > 0 && currentModel && !models.includes(currentModel)) {
|
||||
setSettings(s => ({ ...s, force_model: models[0] }));
|
||||
}
|
||||
} catch {
|
||||
setOllamaReachable(false);
|
||||
@ -115,7 +115,7 @@ function SettingsSection({ password }) {
|
||||
const s = settingsRes.data;
|
||||
setSettings(s);
|
||||
setProxyEndpoint(proxyRes.data.endpoint);
|
||||
fetchModels(s.ollama_url, s.default_model);
|
||||
fetchModels(s.ollama_url, s.force_model);
|
||||
}).catch(() => setError('Einstellungen konnten nicht geladen werden.'));
|
||||
}, []);
|
||||
|
||||
@ -152,7 +152,7 @@ function SettingsSection({ password }) {
|
||||
type="url"
|
||||
value={settings.ollama_url}
|
||||
onChange={(e) => setSettings({ ...settings, ollama_url: e.target.value })}
|
||||
onBlur={(e) => fetchModels(e.target.value, settings.default_model)}
|
||||
onBlur={(e) => fetchModels(e.target.value, settings.force_model)}
|
||||
placeholder="http://localhost:11434"
|
||||
required
|
||||
/>
|
||||
@ -162,23 +162,23 @@ function SettingsSection({ password }) {
|
||||
</div>
|
||||
</div>
|
||||
<div className="settings-row">
|
||||
<label>Standard-Modell</label>
|
||||
<label>Aktives Modell (Lock)</label>
|
||||
{modelsLoading ? (
|
||||
<span className="settings-value">Lade Modelle…</span>
|
||||
) : availableModels.length > 0 ? (
|
||||
<select
|
||||
value={settings.default_model}
|
||||
onChange={(e) => setSettings({ ...settings, default_model: e.target.value })}
|
||||
value={settings.force_model || ""}
|
||||
onChange={(e) => setSettings({ ...settings, force_model: e.target.value || null })}
|
||||
>
|
||||
<option value="">— kein Lock —</option>
|
||||
{availableModels.map(m => <option key={m} value={m}>{m}</option>)}
|
||||
</select>
|
||||
) : (
|
||||
<input
|
||||
type="text"
|
||||
value={settings.default_model}
|
||||
onChange={(e) => setSettings({ ...settings, default_model: e.target.value })}
|
||||
placeholder="llama3"
|
||||
required
|
||||
value={settings.force_model || ""}
|
||||
onChange={(e) => setSettings({ ...settings, force_model: e.target.value || null })}
|
||||
placeholder="leer = kein Lock"
|
||||
/>
|
||||
)}
|
||||
</div>
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user