Compare commits
3 Commits
256bafe30d
...
f823e7d314
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f823e7d314 | ||
|
|
34b108f4df | ||
|
|
cced65693c |
@ -7,6 +7,7 @@ A lightweight reverse proxy for [Ollama](https://ollama.com) that manages API ke
|
|||||||
- OpenAI-compatible endpoint (`/v1/chat/completions`, `/v1/models`)
|
- OpenAI-compatible endpoint (`/v1/chat/completions`, `/v1/models`)
|
||||||
- API key management with daily and monthly token/request limits
|
- API key management with daily and monthly token/request limits
|
||||||
- Web-based admin interface (port 8001)
|
- Web-based admin interface (port 8001)
|
||||||
|
- Model lock: enforces a specific model for all requests (useful for courses and lab sessions)
|
||||||
- Streaming support (Server-Sent Events)
|
- Streaming support (Server-Sent Events)
|
||||||
- Tool use / function calling passthrough
|
- Tool use / function calling passthrough
|
||||||
- Rotating usage logs
|
- Rotating usage logs
|
||||||
@ -27,7 +28,6 @@ All API endpoints require the `ADMIN_PASSWORD` — without a valid token, only t
|
|||||||
|----------|---------|-------------|
|
|----------|---------|-------------|
|
||||||
| `ADMIN_PASSWORD` | – | **Required.** Password for the admin interface |
|
| `ADMIN_PASSWORD` | – | **Required.** Password for the admin interface |
|
||||||
| `OLLAMA_URL` | `http://localhost:11434` | URL of the Ollama server (without `/v1` suffix) |
|
| `OLLAMA_URL` | `http://localhost:11434` | URL of the Ollama server (without `/v1` suffix) |
|
||||||
| `DEFAULT_MODEL` | `llama3` | Model used when the client does not specify one |
|
|
||||||
| `DATABASE_URL` | `sqlite:///./test.db` | Database connection string (SQLite or PostgreSQL) |
|
| `DATABASE_URL` | `sqlite:///./test.db` | Database connection string (SQLite or PostgreSQL) |
|
||||||
| `PROXY_HOST` | `0.0.0.0` | Proxy bind address |
|
| `PROXY_HOST` | `0.0.0.0` | Proxy bind address |
|
||||||
| `PROXY_PORT` | `8000` | Proxy port |
|
| `PROXY_PORT` | `8000` | Proxy port |
|
||||||
@ -59,7 +59,6 @@ volumes:
|
|||||||
```env
|
```env
|
||||||
ADMIN_PASSWORD=changeme
|
ADMIN_PASSWORD=changeme
|
||||||
OLLAMA_URL=http://localhost:11434
|
OLLAMA_URL=http://localhost:11434
|
||||||
DEFAULT_MODEL=llama3
|
|
||||||
APP_TZ=Europe/Berlin
|
APP_TZ=Europe/Berlin
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -78,7 +77,7 @@ services:
|
|||||||
environment:
|
environment:
|
||||||
ADMIN_PASSWORD: changeme
|
ADMIN_PASSWORD: changeme
|
||||||
OLLAMA_URL: http://ollama:11434
|
OLLAMA_URL: http://ollama:11434
|
||||||
DEFAULT_MODEL: llama3
|
|
||||||
APP_TZ: Europe/Berlin
|
APP_TZ: Europe/Berlin
|
||||||
volumes:
|
volumes:
|
||||||
- llmproxy-data:/app/backend
|
- llmproxy-data:/app/backend
|
||||||
@ -111,7 +110,7 @@ services:
|
|||||||
environment:
|
environment:
|
||||||
ADMIN_PASSWORD: changeme
|
ADMIN_PASSWORD: changeme
|
||||||
OLLAMA_URL: http://ollama:11434
|
OLLAMA_URL: http://ollama:11434
|
||||||
DEFAULT_MODEL: llama3
|
|
||||||
APP_TZ: Europe/Berlin
|
APP_TZ: Europe/Berlin
|
||||||
DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
|
DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
|
||||||
depends_on:
|
depends_on:
|
||||||
|
|||||||
@ -7,6 +7,7 @@ Ein schlanker Reverse-Proxy für [Ollama](https://ollama.com), der API-Keys mit
|
|||||||
- OpenAI-kompatibler Endpunkt (`/v1/chat/completions`, `/v1/models`)
|
- OpenAI-kompatibler Endpunkt (`/v1/chat/completions`, `/v1/models`)
|
||||||
- API-Key-Verwaltung mit tages- und monatlichen Token-/Request-Limits
|
- API-Key-Verwaltung mit tages- und monatlichen Token-/Request-Limits
|
||||||
- Web-basierte Admin-Oberfläche (Port 8001)
|
- Web-basierte Admin-Oberfläche (Port 8001)
|
||||||
|
- Modell-Lock: erzwingt ein bestimmtes Modell für alle Requests (nützlich für Praktika/Kurse)
|
||||||
- Streaming-Support (Server-Sent Events)
|
- Streaming-Support (Server-Sent Events)
|
||||||
- Tool-Use / Function Calling wird durchgereicht
|
- Tool-Use / Function Calling wird durchgereicht
|
||||||
- Rotierende Nutzungs-Logs
|
- Rotierende Nutzungs-Logs
|
||||||
@ -27,7 +28,6 @@ Alle API-Endpunkte erfordern das `ADMIN_PASSWORD` — ein Zugriff ohne gültiges
|
|||||||
|----------|----------|--------------|
|
|----------|----------|--------------|
|
||||||
| `ADMIN_PASSWORD` | – | **Pflicht.** Passwort für die Admin-Oberfläche |
|
| `ADMIN_PASSWORD` | – | **Pflicht.** Passwort für die Admin-Oberfläche |
|
||||||
| `OLLAMA_URL` | `http://localhost:11434` | URL des Ollama-Servers (ohne `/v1`-Suffix) |
|
| `OLLAMA_URL` | `http://localhost:11434` | URL des Ollama-Servers (ohne `/v1`-Suffix) |
|
||||||
| `DEFAULT_MODEL` | `llama3` | Modell, das verwendet wird wenn der Client keines angibt |
|
|
||||||
| `DATABASE_URL` | `sqlite:///./test.db` | Datenbank-Verbindungsstring (SQLite oder PostgreSQL) |
|
| `DATABASE_URL` | `sqlite:///./test.db` | Datenbank-Verbindungsstring (SQLite oder PostgreSQL) |
|
||||||
| `PROXY_HOST` | `0.0.0.0` | Bind-Adresse des Proxy |
|
| `PROXY_HOST` | `0.0.0.0` | Bind-Adresse des Proxy |
|
||||||
| `PROXY_PORT` | `8000` | Port des Proxy |
|
| `PROXY_PORT` | `8000` | Port des Proxy |
|
||||||
@ -59,7 +59,6 @@ volumes:
|
|||||||
```env
|
```env
|
||||||
ADMIN_PASSWORD=changeme
|
ADMIN_PASSWORD=changeme
|
||||||
OLLAMA_URL=http://localhost:11434
|
OLLAMA_URL=http://localhost:11434
|
||||||
DEFAULT_MODEL=llama3
|
|
||||||
APP_TZ=Europe/Berlin
|
APP_TZ=Europe/Berlin
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -78,7 +77,7 @@ services:
|
|||||||
environment:
|
environment:
|
||||||
ADMIN_PASSWORD: changeme
|
ADMIN_PASSWORD: changeme
|
||||||
OLLAMA_URL: http://ollama:11434
|
OLLAMA_URL: http://ollama:11434
|
||||||
DEFAULT_MODEL: llama3
|
|
||||||
APP_TZ: Europe/Berlin
|
APP_TZ: Europe/Berlin
|
||||||
volumes:
|
volumes:
|
||||||
- llmproxy-data:/app/backend
|
- llmproxy-data:/app/backend
|
||||||
@ -111,7 +110,7 @@ services:
|
|||||||
environment:
|
environment:
|
||||||
ADMIN_PASSWORD: changeme
|
ADMIN_PASSWORD: changeme
|
||||||
OLLAMA_URL: http://ollama:11434
|
OLLAMA_URL: http://ollama:11434
|
||||||
DEFAULT_MODEL: llama3
|
|
||||||
APP_TZ: Europe/Berlin
|
APP_TZ: Europe/Berlin
|
||||||
DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
|
DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
|
||||||
depends_on:
|
depends_on:
|
||||||
|
|||||||
177
KURZANLEITUNG.md
Normal file
177
KURZANLEITUNG.md
Normal file
@ -0,0 +1,177 @@
|
|||||||
|
# LLM-Dienst – Kurzanleitung
|
||||||
|
|
||||||
|
## Worum geht es?
|
||||||
|
|
||||||
|
Der Dienst stellt **große Sprachmodelle (LLMs)** über eine einfache HTTP-API bereit, die direkt aus Python-Skripten, Jupyter-Notebooks oder eigenen Anwendungen angesprochen werden kann. Die Modelle laufen lokal auf einem GPU-Server im Intranet – ohne Datenübertragung nach außen und ohne Cloud-Kosten.
|
||||||
|
|
||||||
|
Typische Anwendungsfälle:
|
||||||
|
|
||||||
|
- Texte zusammenfassen, übersetzen oder umformulieren
|
||||||
|
- KI-gestütztes Coding (z.B. mit **[opencode](https://opencode.ai)**)
|
||||||
|
- Experimente mit Prompt-Engineering und LLM-Integration in eigene Projekte
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Zugang
|
||||||
|
|
||||||
|
Der Dienst ist **nur im Intranet** erreichbar.
|
||||||
|
|
||||||
|
| | |
|
||||||
|
|---|---|
|
||||||
|
| **API-Endpunkt** | `http://141.75.33.244:8000` |
|
||||||
|
| **Authentifizierung** | API-Key erforderlich (per E-Mail beim Admin anfragen) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verfügbare Modelle
|
||||||
|
|
||||||
|
| Modell | Größe | Hinweis |
|
||||||
|
|---|---|---|
|
||||||
|
| `gemma4:31b` | 19 GB | kompakt, schnell |
|
||||||
|
| `gpt-oss:20b` | 13 GB | kompakt, schnell |
|
||||||
|
| `gpt-oss:120b` | 65 GB | sehr leistungsfähig |
|
||||||
|
| `qwen3.5:122b` | 81 GB | sehr leistungsfähig |
|
||||||
|
| `qwen3-coder-next:q8_0` | 84 GB | speziell für Code |
|
||||||
|
|
||||||
|
> **Wichtig:** Es kann immer nur **ein Modell gleichzeitig** im GPU-Speicher geladen sein.
|
||||||
|
> Wechselt jemand das Modell, muss das vorherige entladen und das neue geladen werden –
|
||||||
|
> das kann **mehrere Minuten** dauern. Der erste Prompt nach einem Modellwechsel ist
|
||||||
|
> deshalb deutlich langsamer. Danach bleibt das Modell einige Zeit geladen.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Python-Beispiel – Einfacher Prompt
|
||||||
|
|
||||||
|
Das API folgt dem **OpenAI-Standard**, d.h. die `openai`-Bibliothek kann direkt verwendet werden.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install openai
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
from openai import OpenAI
|
||||||
|
|
||||||
|
API_KEY = "sk-..." # euren API-Key eintragen
|
||||||
|
BASE_URL = "http://141.75.33.244:8000/v1"
|
||||||
|
MODEL = "gemma4:31b" # Modell nach Bedarf wählen
|
||||||
|
|
||||||
|
client = OpenAI(api_key=API_KEY, base_url=BASE_URL)
|
||||||
|
|
||||||
|
response = client.chat.completions.create(
|
||||||
|
model=MODEL,
|
||||||
|
messages=[
|
||||||
|
{"role": "user", "content": "Erkläre den Unterschied zwischen L1- und L2-Regularisierung."}
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
print(response.choices[0].message.content)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Python-Beispiel – Modell wählen und auflisten
|
||||||
|
|
||||||
|
```python
|
||||||
|
from openai import OpenAI
|
||||||
|
|
||||||
|
API_KEY = "sk-..."
|
||||||
|
BASE_URL = "http://141.75.33.244:8000/v1"
|
||||||
|
|
||||||
|
client = OpenAI(api_key=API_KEY, base_url=BASE_URL)
|
||||||
|
|
||||||
|
# Verfügbare Modelle abrufen
|
||||||
|
models = client.models.list()
|
||||||
|
for m in models.data:
|
||||||
|
print(m.id)
|
||||||
|
|
||||||
|
# Prompt mit einem bestimmten Modell
|
||||||
|
response = client.chat.completions.create(
|
||||||
|
model="qwen3-coder-next:q8_0",
|
||||||
|
messages=[
|
||||||
|
{"role": "system", "content": "Du bist ein hilfreicher Coding-Assistent."},
|
||||||
|
{"role": "user", "content": "Schreibe eine Python-Funktion zum Berechnen der Fibonacci-Folge."}
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
print(response.choices[0].message.content)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Empfehlungen zur Nutzung
|
||||||
|
|
||||||
|
- **Kleines Modell zuerst** (`gemma4:31b` oder `gpt-oss:20b`) – viel schneller, für viele Aufgaben ausreichend.
|
||||||
|
- **Großes Modell** nur bei komplexen Aufgaben (`qwen3.5:122b`, `gpt-oss:120b`).
|
||||||
|
- **Code-Aufgaben**: `qwen3-coder-next:q8_0` ist speziell dafür optimiert.
|
||||||
|
- Wenn möglich, **dasselbe Modell wie andere Nutzer** verwenden, um häufige Modellwechsel zu vermeiden.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quotas
|
||||||
|
|
||||||
|
Je nach API-Key können folgende Limits konfiguriert sein:
|
||||||
|
|
||||||
|
- Maximale **Anfragen pro Tag / Monat**
|
||||||
|
- Maximale **Tokens pro Tag / Monat**
|
||||||
|
|
||||||
|
Bei Überschreitung gibt die API den Statuscode `429 Too Many Requests` zurück.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Coding-Assistent: opencode
|
||||||
|
|
||||||
|
[opencode](https://opencode.ai) ist ein terminal-basierter KI-Coding-Agent (ähnlich Claude Code), der OpenAI-kompatible APIs unterstützt und damit direkt auf den Intranet-Dienst zeigen kann.
|
||||||
|
|
||||||
|
### Installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm install -g opencode-ai
|
||||||
|
# oder
|
||||||
|
curl -fsSL https://opencode.ai/install | bash
|
||||||
|
```
|
||||||
|
|
||||||
|
### Konfiguration
|
||||||
|
|
||||||
|
Konfigurationsdatei anlegen unter `~/.config/opencode/config.json`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"$schema": "https://opencode.ai/config.json",
|
||||||
|
"providers": {
|
||||||
|
"openai": {
|
||||||
|
"apiKey": "sk-...",
|
||||||
|
"baseURL": "http://141.75.33.244:8000/v1"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"model": "openai/qwen3-coder-next:q8_0"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Für Code-Aufgaben empfiehlt sich `qwen3-coder-next:q8_0`, für allgemeine Aufgaben `gemma4:31b` oder `gpt-oss:20b`.
|
||||||
|
|
||||||
|
### Starten
|
||||||
|
|
||||||
|
```bash
|
||||||
|
opencode
|
||||||
|
```
|
||||||
|
|
||||||
|
opencode öffnet eine interaktive TUI im Terminal und kann dann im Projektverzeichnis eingesetzt werden – Dateien lesen, Code generieren, Refactoring vorschlagen usw.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Administration (nur für Admins)
|
||||||
|
|
||||||
|
Das Web-Interface zur Verwaltung von API-Keys und Quotas ist erreichbar unter:
|
||||||
|
|
||||||
|
**`http://141.75.33.244:8001`**
|
||||||
|
|
||||||
|
Dort können API-Keys angelegt, deaktiviert und mit Quotas versehen werden.
|
||||||
|
|
||||||
|
### Modell-Lock für Praktika
|
||||||
|
|
||||||
|
Unter **Einstellungen → Aktives Modell (Lock)** kann ein Modell fest vorgegeben werden. Ist ein Lock gesetzt, wird das `model`-Feld in jedem Request durch dieses Modell ersetzt – unabhängig davon, was der Client schickt. Das verhindert unkoordinierte Modellwechsel während einer Veranstaltung, die alle Teilnehmenden durch lange Ladezeiten ausbremsen würden.
|
||||||
|
|
||||||
|
Typischer Ablauf für ein Praktikum:
|
||||||
|
1. Vor der Veranstaltung: passendes Modell in Ollama laden
|
||||||
|
2. Lock in der Admin-Oberfläche aktivieren
|
||||||
|
3. Nach der Veranstaltung: Lock wieder deaktivieren (Feld leeren)
|
||||||
@ -33,7 +33,6 @@ ADMIN_HOST=0.0.0.0
|
|||||||
ADMIN_PORT=8001
|
ADMIN_PORT=8001
|
||||||
DATABASE_URL=sqlite:///./test.db
|
DATABASE_URL=sqlite:///./test.db
|
||||||
OLLAMA_URL=http://localhost:11434
|
OLLAMA_URL=http://localhost:11434
|
||||||
DEFAULT_MODEL=llama3
|
|
||||||
APP_TZ=Europe/Berlin
|
APP_TZ=Europe/Berlin
|
||||||
LOG_FILE=logs/usage.log
|
LOG_FILE=logs/usage.log
|
||||||
```
|
```
|
||||||
@ -47,7 +46,6 @@ LOG_FILE=logs/usage.log
|
|||||||
| `ADMIN_PORT` | `8001` | Port der Admin-API |
|
| `ADMIN_PORT` | `8001` | Port der Admin-API |
|
||||||
| `DATABASE_URL` | `sqlite:///./test.db` | DB-Verbindungsstring (SQLite oder PostgreSQL) |
|
| `DATABASE_URL` | `sqlite:///./test.db` | DB-Verbindungsstring (SQLite oder PostgreSQL) |
|
||||||
| `OLLAMA_URL` | `http://localhost:11434` | Adresse der Ollama-Instanz (auch in der UI änderbar) |
|
| `OLLAMA_URL` | `http://localhost:11434` | Adresse der Ollama-Instanz (auch in der UI änderbar) |
|
||||||
| `DEFAULT_MODEL` | `llama3` | Standard-Modell für `/v1/chat/completions` (auch in der UI änderbar) |
|
|
||||||
| `APP_TZ` | `Europe/Berlin` | Zeitzone für tägliche/monatliche Quota-Resets |
|
| `APP_TZ` | `Europe/Berlin` | Zeitzone für tägliche/monatliche Quota-Resets |
|
||||||
| `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei |
|
| `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei |
|
||||||
| `ALLOWED_ORIGINS` | `http://localhost:5173` | CORS-Origins (nur für Entwicklung relevant) |
|
| `ALLOWED_ORIGINS` | `http://localhost:5173` | CORS-Origins (nur für Entwicklung relevant) |
|
||||||
|
|||||||
@ -137,7 +137,7 @@ async def get_proxy_info(_ = Depends(require_admin_auth)):
|
|||||||
async def read_settings(db: Session = Depends(get_db), _ = Depends(require_admin_auth)):
|
async def read_settings(db: Session = Depends(get_db), _ = Depends(require_admin_auth)):
|
||||||
return schemas.Settings(
|
return schemas.Settings(
|
||||||
ollama_url=crud.get_setting(db, "ollama_url", "http://localhost:11434"),
|
ollama_url=crud.get_setting(db, "ollama_url", "http://localhost:11434"),
|
||||||
default_model=crud.get_setting(db, "default_model", "llama3"),
|
force_model=crud.get_setting(db, "force_model") or None,
|
||||||
)
|
)
|
||||||
|
|
||||||
@app.put("/api/settings", response_model=schemas.Settings)
|
@app.put("/api/settings", response_model=schemas.Settings)
|
||||||
@ -148,8 +148,8 @@ async def update_settings(
|
|||||||
):
|
):
|
||||||
ollama_url = settings.ollama_url.rstrip('/').removesuffix('/v1')
|
ollama_url = settings.ollama_url.rstrip('/').removesuffix('/v1')
|
||||||
crud.set_setting(db, "ollama_url", ollama_url)
|
crud.set_setting(db, "ollama_url", ollama_url)
|
||||||
crud.set_setting(db, "default_model", settings.default_model)
|
crud.set_setting(db, "force_model", settings.force_model or "")
|
||||||
return schemas.Settings(ollama_url=ollama_url, default_model=settings.default_model)
|
return schemas.Settings(ollama_url=ollama_url, force_model=settings.force_model or None)
|
||||||
|
|
||||||
@app.get("/api/ollama-models")
|
@app.get("/api/ollama-models")
|
||||||
async def get_ollama_models(
|
async def get_ollama_models(
|
||||||
|
|||||||
@ -13,8 +13,6 @@ def init_db():
|
|||||||
db = SessionLocal()
|
db = SessionLocal()
|
||||||
if not get_setting(db, "ollama_url"):
|
if not get_setting(db, "ollama_url"):
|
||||||
set_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
set_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
||||||
if not get_setting(db, "default_model"):
|
|
||||||
set_setting(db, "default_model", os.getenv("DEFAULT_MODEL", "llama3"))
|
|
||||||
db.close()
|
db.close()
|
||||||
|
|
||||||
print("Database initialized.")
|
print("Database initialized.")
|
||||||
|
|||||||
@ -70,8 +70,6 @@ def apply_env_settings():
|
|||||||
try:
|
try:
|
||||||
if url := os.getenv("OLLAMA_URL"):
|
if url := os.getenv("OLLAMA_URL"):
|
||||||
crud.set_setting(db, "ollama_url", url)
|
crud.set_setting(db, "ollama_url", url)
|
||||||
if model := os.getenv("DEFAULT_MODEL"):
|
|
||||||
crud.set_setting(db, "default_model", model)
|
|
||||||
db.commit()
|
db.commit()
|
||||||
finally:
|
finally:
|
||||||
db.close()
|
db.close()
|
||||||
@ -91,6 +89,11 @@ async def proxy_request(url: str, method: str = "GET", json_data: dict = None):
|
|||||||
async def generate(request: Request, db: Session = Depends(get_db)):
|
async def generate(request: Request, db: Session = Depends(get_db)):
|
||||||
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
||||||
body = await request.json()
|
body = await request.json()
|
||||||
|
force_model = crud.get_setting(db, "force_model") or None
|
||||||
|
if force_model:
|
||||||
|
body = {**body, "model": force_model}
|
||||||
|
if not body.get("model"):
|
||||||
|
raise HTTPException(status_code=422, detail="Field 'model' is required")
|
||||||
prompt_tokens = crud.count_tokens(body.get("prompt", ""))
|
prompt_tokens = crud.count_tokens(body.get("prompt", ""))
|
||||||
|
|
||||||
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
|
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
|
||||||
@ -101,7 +104,11 @@ async def generate(request: Request, db: Session = Depends(get_db)):
|
|||||||
request.state.api_key_name, body.get("model", "?"), prompt_tokens, prompt_preview)
|
request.state.api_key_name, body.get("model", "?"), prompt_tokens, prompt_preview)
|
||||||
try:
|
try:
|
||||||
response = await proxy_request(f"{ollama_url}/api/generate", method="POST", json_data=body)
|
response = await proxy_request(f"{ollama_url}/api/generate", method="POST", json_data=body)
|
||||||
return JSONResponse(content=response.json(), status_code=response.status_code)
|
resp_json = response.json()
|
||||||
|
usage_log.info('%s | /api/generate | %s | actual ↑%d ↓%d tokens',
|
||||||
|
request.state.api_key_name, body.get("model", "?"),
|
||||||
|
resp_json.get("prompt_eval_count", 0), resp_json.get("eval_count", 0))
|
||||||
|
return JSONResponse(content=resp_json, status_code=response.status_code)
|
||||||
except Exception as exc:
|
except Exception as exc:
|
||||||
error_log.error("Proxy error | %s | /api/generate | %s | %s: %s",
|
error_log.error("Proxy error | %s | /api/generate | %s | %s: %s",
|
||||||
request.state.api_key_name, body.get("model", "?"), type(exc).__name__, exc, exc_info=exc)
|
request.state.api_key_name, body.get("model", "?"), type(exc).__name__, exc, exc_info=exc)
|
||||||
@ -111,6 +118,11 @@ async def generate(request: Request, db: Session = Depends(get_db)):
|
|||||||
async def chat(request: Request, db: Session = Depends(get_db)):
|
async def chat(request: Request, db: Session = Depends(get_db)):
|
||||||
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
||||||
body = await request.json()
|
body = await request.json()
|
||||||
|
force_model = crud.get_setting(db, "force_model") or None
|
||||||
|
if force_model:
|
||||||
|
body = {**body, "model": force_model}
|
||||||
|
if not body.get("model"):
|
||||||
|
raise HTTPException(status_code=422, detail="Field 'model' is required")
|
||||||
messages = body.get("messages", [])
|
messages = body.get("messages", [])
|
||||||
prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages)
|
prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages)
|
||||||
|
|
||||||
@ -121,7 +133,11 @@ async def chat(request: Request, db: Session = Depends(get_db)):
|
|||||||
request.state.api_key_name, body.get("model", "?"), prompt_tokens, _last_user_msg(messages))
|
request.state.api_key_name, body.get("model", "?"), prompt_tokens, _last_user_msg(messages))
|
||||||
try:
|
try:
|
||||||
response = await proxy_request(f"{ollama_url}/api/chat", method="POST", json_data=body)
|
response = await proxy_request(f"{ollama_url}/api/chat", method="POST", json_data=body)
|
||||||
return JSONResponse(content=response.json(), status_code=response.status_code)
|
resp_json = response.json()
|
||||||
|
usage_log.info('%s | /api/chat | %s | actual ↑%d ↓%d tokens',
|
||||||
|
request.state.api_key_name, body.get("model", "?"),
|
||||||
|
resp_json.get("prompt_eval_count", 0), resp_json.get("eval_count", 0))
|
||||||
|
return JSONResponse(content=resp_json, status_code=response.status_code)
|
||||||
except Exception as exc:
|
except Exception as exc:
|
||||||
error_log.error("Proxy error | %s | /api/chat | %s | %s: %s",
|
error_log.error("Proxy error | %s | /api/chat | %s | %s: %s",
|
||||||
request.state.api_key_name, body.get("model", "?"), type(exc).__name__, exc, exc_info=exc)
|
request.state.api_key_name, body.get("model", "?"), type(exc).__name__, exc, exc_info=exc)
|
||||||
@ -148,19 +164,21 @@ async def list_openai_models(db: Session = Depends(get_db)):
|
|||||||
@app.post("/v1/chat/completions")
|
@app.post("/v1/chat/completions")
|
||||||
async def openai_chat_completions(request: Request, db: Session = Depends(get_db)):
|
async def openai_chat_completions(request: Request, db: Session = Depends(get_db)):
|
||||||
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
||||||
default_model = crud.get_setting(db, "default_model", os.getenv("DEFAULT_MODEL", "llama3"))
|
|
||||||
|
|
||||||
body = await request.json()
|
body = await request.json()
|
||||||
|
force_model = crud.get_setting(db, "force_model") or None
|
||||||
|
if force_model:
|
||||||
|
body = {**body, "model": force_model}
|
||||||
|
if not body.get("model"):
|
||||||
|
raise HTTPException(status_code=422, detail="Field 'model' is required")
|
||||||
messages = body.get("messages", [])
|
messages = body.get("messages", [])
|
||||||
prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages)
|
prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages)
|
||||||
|
|
||||||
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
|
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
|
||||||
raise HTTPException(status_code=429, detail="Quota exceeded")
|
raise HTTPException(status_code=429, detail="Quota exceeded")
|
||||||
|
|
||||||
if "model" not in body:
|
|
||||||
body = {**body, "model": default_model}
|
|
||||||
|
|
||||||
model_name = body["model"]
|
model_name = body["model"]
|
||||||
|
|
||||||
usage_log.info('%s | /v1/chat/completions | %s | ~%d tokens | "%s"',
|
usage_log.info('%s | /v1/chat/completions | %s | ~%d tokens | "%s"',
|
||||||
request.state.api_key_name, model_name, prompt_tokens, _last_user_msg(messages))
|
request.state.api_key_name, model_name, prompt_tokens, _last_user_msg(messages))
|
||||||
|
|
||||||
@ -185,7 +203,12 @@ async def openai_chat_completions(request: Request, db: Session = Depends(get_db
|
|||||||
|
|
||||||
try:
|
try:
|
||||||
response = await proxy_request(target, method="POST", json_data=body)
|
response = await proxy_request(target, method="POST", json_data=body)
|
||||||
return JSONResponse(content=response.json(), status_code=response.status_code)
|
resp_json = response.json()
|
||||||
|
usage = resp_json.get("usage", {})
|
||||||
|
usage_log.info('%s | /v1/chat/completions | %s | actual ↑%d ↓%d tokens',
|
||||||
|
request.state.api_key_name, model_name,
|
||||||
|
usage.get("prompt_tokens", 0), usage.get("completion_tokens", 0))
|
||||||
|
return JSONResponse(content=resp_json, status_code=response.status_code)
|
||||||
except Exception as exc:
|
except Exception as exc:
|
||||||
error_log.error("Proxy error | %s | /v1/chat/completions | %s | %s: %s",
|
error_log.error("Proxy error | %s | /v1/chat/completions | %s | %s: %s",
|
||||||
request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
|
request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
|
||||||
|
|||||||
@ -40,7 +40,7 @@ class QuotaUpdate(BaseModel):
|
|||||||
|
|
||||||
class Settings(BaseModel):
|
class Settings(BaseModel):
|
||||||
ollama_url: str
|
ollama_url: str
|
||||||
default_model: str
|
force_model: Optional[str] = None
|
||||||
|
|
||||||
class UsageStats(BaseModel):
|
class UsageStats(BaseModel):
|
||||||
tokens_used_today: int = 0
|
tokens_used_today: int = 0
|
||||||
|
|||||||
@ -95,8 +95,8 @@ function SettingsSection({ password }) {
|
|||||||
const { models, reachable } = res.data;
|
const { models, reachable } = res.data;
|
||||||
setOllamaReachable(reachable);
|
setOllamaReachable(reachable);
|
||||||
setAvailableModels(models);
|
setAvailableModels(models);
|
||||||
if (models.length > 0 && !models.includes(currentModel)) {
|
if (models.length > 0 && currentModel && !models.includes(currentModel)) {
|
||||||
setSettings(s => ({ ...s, default_model: models[0] }));
|
setSettings(s => ({ ...s, force_model: models[0] }));
|
||||||
}
|
}
|
||||||
} catch {
|
} catch {
|
||||||
setOllamaReachable(false);
|
setOllamaReachable(false);
|
||||||
@ -115,7 +115,7 @@ function SettingsSection({ password }) {
|
|||||||
const s = settingsRes.data;
|
const s = settingsRes.data;
|
||||||
setSettings(s);
|
setSettings(s);
|
||||||
setProxyEndpoint(proxyRes.data.endpoint);
|
setProxyEndpoint(proxyRes.data.endpoint);
|
||||||
fetchModels(s.ollama_url, s.default_model);
|
fetchModels(s.ollama_url, s.force_model);
|
||||||
}).catch(() => setError('Einstellungen konnten nicht geladen werden.'));
|
}).catch(() => setError('Einstellungen konnten nicht geladen werden.'));
|
||||||
}, []);
|
}, []);
|
||||||
|
|
||||||
@ -152,7 +152,7 @@ function SettingsSection({ password }) {
|
|||||||
type="url"
|
type="url"
|
||||||
value={settings.ollama_url}
|
value={settings.ollama_url}
|
||||||
onChange={(e) => setSettings({ ...settings, ollama_url: e.target.value })}
|
onChange={(e) => setSettings({ ...settings, ollama_url: e.target.value })}
|
||||||
onBlur={(e) => fetchModels(e.target.value, settings.default_model)}
|
onBlur={(e) => fetchModels(e.target.value, settings.force_model)}
|
||||||
placeholder="http://localhost:11434"
|
placeholder="http://localhost:11434"
|
||||||
required
|
required
|
||||||
/>
|
/>
|
||||||
@ -162,23 +162,23 @@ function SettingsSection({ password }) {
|
|||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<div className="settings-row">
|
<div className="settings-row">
|
||||||
<label>Standard-Modell</label>
|
<label>Aktives Modell (Lock)</label>
|
||||||
{modelsLoading ? (
|
{modelsLoading ? (
|
||||||
<span className="settings-value">Lade Modelle…</span>
|
<span className="settings-value">Lade Modelle…</span>
|
||||||
) : availableModels.length > 0 ? (
|
) : availableModels.length > 0 ? (
|
||||||
<select
|
<select
|
||||||
value={settings.default_model}
|
value={settings.force_model || ""}
|
||||||
onChange={(e) => setSettings({ ...settings, default_model: e.target.value })}
|
onChange={(e) => setSettings({ ...settings, force_model: e.target.value || null })}
|
||||||
>
|
>
|
||||||
|
<option value="">— kein Lock —</option>
|
||||||
{availableModels.map(m => <option key={m} value={m}>{m}</option>)}
|
{availableModels.map(m => <option key={m} value={m}>{m}</option>)}
|
||||||
</select>
|
</select>
|
||||||
) : (
|
) : (
|
||||||
<input
|
<input
|
||||||
type="text"
|
type="text"
|
||||||
value={settings.default_model}
|
value={settings.force_model || ""}
|
||||||
onChange={(e) => setSettings({ ...settings, default_model: e.target.value })}
|
onChange={(e) => setSettings({ ...settings, force_model: e.target.value || null })}
|
||||||
placeholder="llama3"
|
placeholder="leer = kein Lock"
|
||||||
required
|
|
||||||
/>
|
/>
|
||||||
)}
|
)}
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user