Compare commits
No commits in common. "main" and "0.9.8" have entirely different histories.
@ -42,8 +42,6 @@ docker-compose.yml
|
|||||||
|
|
||||||
# Docs
|
# Docs
|
||||||
*.md
|
*.md
|
||||||
*.tex
|
|
||||||
*.pdf
|
|
||||||
|
|
||||||
# Dev & build scripts
|
# Dev & build scripts
|
||||||
run_dev.py
|
run_dev.py
|
||||||
|
|||||||
@ -16,8 +16,3 @@ DATABASE_URL=sqlite:///./test.db
|
|||||||
OLLAMA_URL=http://localhost:11434
|
OLLAMA_URL=http://localhost:11434
|
||||||
DEFAULT_MODEL=llama3
|
DEFAULT_MODEL=llama3
|
||||||
APP_TZ=Europe/Berlin
|
APP_TZ=Europe/Berlin
|
||||||
|
|
||||||
# Standard-Modell für den Anthropic-kompatiblen Endpunkt (/v1/messages)
|
|
||||||
# Wird verwendet, wenn der Client kein Modell angibt oder ein Anthropic-Modellname
|
|
||||||
# (z.B. claude-opus-4-7) auf kein lokales Modell passt.
|
|
||||||
ANTHROPIC_DEFAULT_MODEL=llama3
|
|
||||||
|
|||||||
7
.gitignore
vendored
7
.gitignore
vendored
@ -27,10 +27,3 @@ frontend/dist/
|
|||||||
|
|
||||||
# Misc
|
# Misc
|
||||||
config.json
|
config.json
|
||||||
|
|
||||||
# Generated documents
|
|
||||||
KURZANLEITUNG.tex
|
|
||||||
KURZANLEITUNG.pdf
|
|
||||||
|
|
||||||
# Internal planning docs
|
|
||||||
docs/
|
|
||||||
@ -1,14 +1,12 @@
|
|||||||
# mediaeng/llmproxy
|
# mediaeng/llmproxy
|
||||||
|
|
||||||
A lightweight reverse proxy for [Ollama](https://ollama.com) that manages API keys with configurable token and request quotas. Incoming requests in OpenAI-compatible or Anthropic-compatible format are authenticated, checked against the quota, and forwarded to the configured Ollama server.
|
A lightweight reverse proxy for [Ollama](https://ollama.com) that manages API keys with configurable token and request quotas. Incoming requests in OpenAI-compatible format are authenticated, checked against the quota, and forwarded to the configured Ollama server.
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
- OpenAI-compatible endpoint (`/v1/chat/completions`, `/v1/models`)
|
- OpenAI-compatible endpoint (`/v1/chat/completions`, `/v1/models`)
|
||||||
- Anthropic Messages API (`/v1/messages`) — compatible with Claude Code CLI and Anthropic SDK clients
|
|
||||||
- API key management with daily and monthly token/request limits
|
- API key management with daily and monthly token/request limits
|
||||||
- Web-based admin interface (port 8001)
|
- Web-based admin interface (port 8001)
|
||||||
- Model lock: enforces a specific model for all requests (useful for courses and lab sessions)
|
|
||||||
- Streaming support (Server-Sent Events)
|
- Streaming support (Server-Sent Events)
|
||||||
- Tool use / function calling passthrough
|
- Tool use / function calling passthrough
|
||||||
- Rotating usage logs
|
- Rotating usage logs
|
||||||
@ -18,7 +16,7 @@ A lightweight reverse proxy for [Ollama](https://ollama.com) that manages API ke
|
|||||||
|
|
||||||
| Port | Service |
|
| Port | Service |
|
||||||
|------|---------|
|
|------|---------|
|
||||||
| `8000` | Proxy endpoint (OpenAI and Anthropic API) |
|
| `8000` | Proxy endpoint (OpenAI API) |
|
||||||
| `8001` | Admin API + web interface |
|
| `8001` | Admin API + web interface |
|
||||||
|
|
||||||
All API endpoints require the `ADMIN_PASSWORD` — without a valid token, only the public frontend files (HTML/JS/CSS of the login page) are accessible. The password is therefore the primary protection.
|
All API endpoints require the `ADMIN_PASSWORD` — without a valid token, only the public frontend files (HTML/JS/CSS of the login page) are accessible. The password is therefore the primary protection.
|
||||||
@ -29,6 +27,7 @@ All API endpoints require the `ADMIN_PASSWORD` — without a valid token, only t
|
|||||||
|----------|---------|-------------|
|
|----------|---------|-------------|
|
||||||
| `ADMIN_PASSWORD` | – | **Required.** Password for the admin interface |
|
| `ADMIN_PASSWORD` | – | **Required.** Password for the admin interface |
|
||||||
| `OLLAMA_URL` | `http://localhost:11434` | URL of the Ollama server (without `/v1` suffix) |
|
| `OLLAMA_URL` | `http://localhost:11434` | URL of the Ollama server (without `/v1` suffix) |
|
||||||
|
| `DEFAULT_MODEL` | `llama3` | Model used when the client does not specify one |
|
||||||
| `DATABASE_URL` | `sqlite:///./test.db` | Database connection string (SQLite or PostgreSQL) |
|
| `DATABASE_URL` | `sqlite:///./test.db` | Database connection string (SQLite or PostgreSQL) |
|
||||||
| `PROXY_HOST` | `0.0.0.0` | Proxy bind address |
|
| `PROXY_HOST` | `0.0.0.0` | Proxy bind address |
|
||||||
| `PROXY_PORT` | `8000` | Proxy port |
|
| `PROXY_PORT` | `8000` | Proxy port |
|
||||||
@ -36,7 +35,6 @@ All API endpoints require the `ADMIN_PASSWORD` — without a valid token, only t
|
|||||||
| `ADMIN_PORT` | `8001` | Admin API port |
|
| `ADMIN_PORT` | `8001` | Admin API port |
|
||||||
| `APP_TZ` | `Europe/Berlin` | Timezone for daily/monthly quota resets |
|
| `APP_TZ` | `Europe/Berlin` | Timezone for daily/monthly quota resets |
|
||||||
| `LOG_FILE` | `logs/usage.log` | Path of the rotating usage log file |
|
| `LOG_FILE` | `logs/usage.log` | Path of the rotating usage log file |
|
||||||
| `ANTHROPIC_DEFAULT_MODEL` | – | Default model for `/v1/messages` (Ollama model name, e.g. `llama3`) |
|
|
||||||
|
|
||||||
## Docker Compose – Ollama on the Host (Linux, recommended)
|
## Docker Compose – Ollama on the Host (Linux, recommended)
|
||||||
|
|
||||||
@ -61,8 +59,8 @@ volumes:
|
|||||||
```env
|
```env
|
||||||
ADMIN_PASSWORD=changeme
|
ADMIN_PASSWORD=changeme
|
||||||
OLLAMA_URL=http://localhost:11434
|
OLLAMA_URL=http://localhost:11434
|
||||||
|
DEFAULT_MODEL=llama3
|
||||||
APP_TZ=Europe/Berlin
|
APP_TZ=Europe/Berlin
|
||||||
ANTHROPIC_DEFAULT_MODEL=llama3
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Docker Compose – Ollama as Container, SQLite
|
## Docker Compose – Ollama as Container, SQLite
|
||||||
@ -80,8 +78,8 @@ services:
|
|||||||
environment:
|
environment:
|
||||||
ADMIN_PASSWORD: changeme
|
ADMIN_PASSWORD: changeme
|
||||||
OLLAMA_URL: http://ollama:11434
|
OLLAMA_URL: http://ollama:11434
|
||||||
|
DEFAULT_MODEL: llama3
|
||||||
APP_TZ: Europe/Berlin
|
APP_TZ: Europe/Berlin
|
||||||
ANTHROPIC_DEFAULT_MODEL: llama3
|
|
||||||
volumes:
|
volumes:
|
||||||
- llmproxy-data:/app/backend
|
- llmproxy-data:/app/backend
|
||||||
depends_on:
|
depends_on:
|
||||||
@ -113,9 +111,9 @@ services:
|
|||||||
environment:
|
environment:
|
||||||
ADMIN_PASSWORD: changeme
|
ADMIN_PASSWORD: changeme
|
||||||
OLLAMA_URL: http://ollama:11434
|
OLLAMA_URL: http://ollama:11434
|
||||||
|
DEFAULT_MODEL: llama3
|
||||||
APP_TZ: Europe/Berlin
|
APP_TZ: Europe/Berlin
|
||||||
DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
|
DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
|
||||||
ANTHROPIC_DEFAULT_MODEL: llama3
|
|
||||||
depends_on:
|
depends_on:
|
||||||
db:
|
db:
|
||||||
condition: service_healthy
|
condition: service_healthy
|
||||||
@ -150,23 +148,9 @@ volumes:
|
|||||||
|
|
||||||
## Client Configuration
|
## Client Configuration
|
||||||
|
|
||||||
**OpenAI-compatible client:**
|
Configure the proxy as an OpenAI-compatible endpoint:
|
||||||
|
|
||||||
```
|
```
|
||||||
Base URL: http://<host>:8000/v1
|
Base URL: http://<host>:8000/v1
|
||||||
API Key: <API key created in the admin interface>
|
API Key: <API key created in the admin interface>
|
||||||
```
|
```
|
||||||
|
|
||||||
**Claude Code CLI:**
|
|
||||||
```bash
|
|
||||||
ANTHROPIC_BASE_URL=http://<host>:8000 \
|
|
||||||
ANTHROPIC_AUTH_TOKEN=<API key created in the admin interface> \
|
|
||||||
claude
|
|
||||||
```
|
|
||||||
|
|
||||||
## Acknowledgements
|
|
||||||
|
|
||||||
The Anthropic Messages API endpoint (`/v1/messages`) was inspired by [free-claude-code](https://github.com/Alishahryar1/free-claude-code) by Ali Khokhar, which pursues a similar approach for routing Claude Code requests to alternative LLM backends.
|
|
||||||
|
|
||||||
## License
|
|
||||||
|
|
||||||
MIT — © 2026 Oliver Hofmann. See [LICENSE](https://git.efi.th-nuernberg.de/gitea/hofmannol/llmproxy/src/branch/main/LICENSE) for details.
|
|
||||||
|
|||||||
28
DOCKERHUB.md
28
DOCKERHUB.md
@ -1,14 +1,12 @@
|
|||||||
# mediaeng/llmproxy
|
# mediaeng/llmproxy
|
||||||
|
|
||||||
Ein schlanker Reverse-Proxy für [Ollama](https://ollama.com), der API-Keys mit konfigurierbaren Token- und Request-Quoten verwaltet. Eingehende Anfragen im OpenAI-kompatiblen oder Anthropic-kompatiblen Format werden authentifiziert, auf Quota geprüft und an den konfigurierten Ollama-Server weitergeleitet.
|
Ein schlanker Reverse-Proxy für [Ollama](https://ollama.com), der API-Keys mit konfigurierbaren Token- und Request-Quoten verwaltet. Eingehende Anfragen im OpenAI-kompatiblen Format werden authentifiziert, auf Quota geprüft und an den konfigurierten Ollama-Server weitergeleitet.
|
||||||
|
|
||||||
## Funktionen
|
## Funktionen
|
||||||
|
|
||||||
- OpenAI-kompatibler Endpunkt (`/v1/chat/completions`, `/v1/models`)
|
- OpenAI-kompatibler Endpunkt (`/v1/chat/completions`, `/v1/models`)
|
||||||
- Anthropic Messages API (`/v1/messages`) — kompatibel mit Claude Code CLI und Anthropic-SDK-Clients
|
|
||||||
- API-Key-Verwaltung mit tages- und monatlichen Token-/Request-Limits
|
- API-Key-Verwaltung mit tages- und monatlichen Token-/Request-Limits
|
||||||
- Web-basierte Admin-Oberfläche (Port 8001)
|
- Web-basierte Admin-Oberfläche (Port 8001)
|
||||||
- Modell-Lock: erzwingt ein bestimmtes Modell für alle Requests (nützlich für Praktika/Kurse)
|
|
||||||
- Streaming-Support (Server-Sent Events)
|
- Streaming-Support (Server-Sent Events)
|
||||||
- Tool-Use / Function Calling wird durchgereicht
|
- Tool-Use / Function Calling wird durchgereicht
|
||||||
- Rotierende Nutzungs-Logs
|
- Rotierende Nutzungs-Logs
|
||||||
@ -18,7 +16,7 @@ Ein schlanker Reverse-Proxy für [Ollama](https://ollama.com), der API-Keys mit
|
|||||||
|
|
||||||
| Port | Dienst |
|
| Port | Dienst |
|
||||||
|------|--------|
|
|------|--------|
|
||||||
| `8000` | Proxy-Endpunkt (OpenAI- und Anthropic-API) |
|
| `8000` | Proxy-Endpunkt (OpenAI-API) |
|
||||||
| `8001` | Admin-API + Web-Oberfläche |
|
| `8001` | Admin-API + Web-Oberfläche |
|
||||||
|
|
||||||
Alle API-Endpunkte erfordern das `ADMIN_PASSWORD` — ein Zugriff ohne gültiges Token liefert nur die öffentlichen Frontend-Dateien (HTML/JS/CSS der Login-Seite). Das Passwort ist damit die primäre Schutzmaßnahme.
|
Alle API-Endpunkte erfordern das `ADMIN_PASSWORD` — ein Zugriff ohne gültiges Token liefert nur die öffentlichen Frontend-Dateien (HTML/JS/CSS der Login-Seite). Das Passwort ist damit die primäre Schutzmaßnahme.
|
||||||
@ -29,6 +27,7 @@ Alle API-Endpunkte erfordern das `ADMIN_PASSWORD` — ein Zugriff ohne gültiges
|
|||||||
|----------|----------|--------------|
|
|----------|----------|--------------|
|
||||||
| `ADMIN_PASSWORD` | – | **Pflicht.** Passwort für die Admin-Oberfläche |
|
| `ADMIN_PASSWORD` | – | **Pflicht.** Passwort für die Admin-Oberfläche |
|
||||||
| `OLLAMA_URL` | `http://localhost:11434` | URL des Ollama-Servers (ohne `/v1`-Suffix) |
|
| `OLLAMA_URL` | `http://localhost:11434` | URL des Ollama-Servers (ohne `/v1`-Suffix) |
|
||||||
|
| `DEFAULT_MODEL` | `llama3` | Modell, das verwendet wird wenn der Client keines angibt |
|
||||||
| `DATABASE_URL` | `sqlite:///./test.db` | Datenbank-Verbindungsstring (SQLite oder PostgreSQL) |
|
| `DATABASE_URL` | `sqlite:///./test.db` | Datenbank-Verbindungsstring (SQLite oder PostgreSQL) |
|
||||||
| `PROXY_HOST` | `0.0.0.0` | Bind-Adresse des Proxy |
|
| `PROXY_HOST` | `0.0.0.0` | Bind-Adresse des Proxy |
|
||||||
| `PROXY_PORT` | `8000` | Port des Proxy |
|
| `PROXY_PORT` | `8000` | Port des Proxy |
|
||||||
@ -36,7 +35,6 @@ Alle API-Endpunkte erfordern das `ADMIN_PASSWORD` — ein Zugriff ohne gültiges
|
|||||||
| `ADMIN_PORT` | `8001` | Port der Admin-API |
|
| `ADMIN_PORT` | `8001` | Port der Admin-API |
|
||||||
| `APP_TZ` | `Europe/Berlin` | Zeitzone für Tages-/Monats-Reset der Quoten |
|
| `APP_TZ` | `Europe/Berlin` | Zeitzone für Tages-/Monats-Reset der Quoten |
|
||||||
| `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei |
|
| `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei |
|
||||||
| `ANTHROPIC_DEFAULT_MODEL` | – | Standard-Modell für `/v1/messages` (Ollama-Modellname, z. B. `llama3`) |
|
|
||||||
|
|
||||||
## Docker Compose – Ollama auf dem Host (Linux, empfohlen)
|
## Docker Compose – Ollama auf dem Host (Linux, empfohlen)
|
||||||
|
|
||||||
@ -61,8 +59,8 @@ volumes:
|
|||||||
```env
|
```env
|
||||||
ADMIN_PASSWORD=changeme
|
ADMIN_PASSWORD=changeme
|
||||||
OLLAMA_URL=http://localhost:11434
|
OLLAMA_URL=http://localhost:11434
|
||||||
|
DEFAULT_MODEL=llama3
|
||||||
APP_TZ=Europe/Berlin
|
APP_TZ=Europe/Berlin
|
||||||
ANTHROPIC_DEFAULT_MODEL=llama3
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Docker Compose – Ollama als Container, SQLite
|
## Docker Compose – Ollama als Container, SQLite
|
||||||
@ -80,8 +78,8 @@ services:
|
|||||||
environment:
|
environment:
|
||||||
ADMIN_PASSWORD: changeme
|
ADMIN_PASSWORD: changeme
|
||||||
OLLAMA_URL: http://ollama:11434
|
OLLAMA_URL: http://ollama:11434
|
||||||
|
DEFAULT_MODEL: llama3
|
||||||
APP_TZ: Europe/Berlin
|
APP_TZ: Europe/Berlin
|
||||||
ANTHROPIC_DEFAULT_MODEL: llama3
|
|
||||||
volumes:
|
volumes:
|
||||||
- llmproxy-data:/app/backend
|
- llmproxy-data:/app/backend
|
||||||
depends_on:
|
depends_on:
|
||||||
@ -113,9 +111,9 @@ services:
|
|||||||
environment:
|
environment:
|
||||||
ADMIN_PASSWORD: changeme
|
ADMIN_PASSWORD: changeme
|
||||||
OLLAMA_URL: http://ollama:11434
|
OLLAMA_URL: http://ollama:11434
|
||||||
|
DEFAULT_MODEL: llama3
|
||||||
APP_TZ: Europe/Berlin
|
APP_TZ: Europe/Berlin
|
||||||
DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
|
DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
|
||||||
ANTHROPIC_DEFAULT_MODEL: llama3
|
|
||||||
depends_on:
|
depends_on:
|
||||||
db:
|
db:
|
||||||
condition: service_healthy
|
condition: service_healthy
|
||||||
@ -150,19 +148,9 @@ volumes:
|
|||||||
|
|
||||||
## Client-Konfiguration
|
## Client-Konfiguration
|
||||||
|
|
||||||
**OpenAI-kompatibler Client:**
|
Den Proxy als OpenAI-kompatibler Endpunkt konfigurieren:
|
||||||
|
|
||||||
```
|
```
|
||||||
Base URL: http://<host>:8000/v1
|
Base URL: http://<host>:8000/v1
|
||||||
API Key: <angelegter API-Key aus der Admin-Oberfläche>
|
API Key: <angelegter API-Key aus der Admin-Oberfläche>
|
||||||
```
|
```
|
||||||
|
|
||||||
**Claude Code CLI:**
|
|
||||||
```bash
|
|
||||||
ANTHROPIC_BASE_URL=http://<host>:8000 \
|
|
||||||
ANTHROPIC_AUTH_TOKEN=<API-Key> \
|
|
||||||
claude
|
|
||||||
```
|
|
||||||
|
|
||||||
## Lizenz
|
|
||||||
|
|
||||||
MIT — © 2026 Oliver Hofmann. Details siehe [LICENSE](https://git.efi.th-nuernberg.de/gitea/hofmannol/llmproxy/src/branch/main/LICENSE).
|
|
||||||
|
|||||||
@ -6,8 +6,6 @@ COPY frontend/ frontend/
|
|||||||
RUN npm run build --prefix frontend
|
RUN npm run build --prefix frontend
|
||||||
|
|
||||||
FROM python:3.12-slim
|
FROM python:3.12-slim
|
||||||
ARG APP_VERSION=dev
|
|
||||||
ENV APP_VERSION=$APP_VERSION
|
|
||||||
WORKDIR /app
|
WORKDIR /app
|
||||||
|
|
||||||
COPY backend/requirements.txt .
|
COPY backend/requirements.txt .
|
||||||
|
|||||||
204
KURZANLEITUNG.md
204
KURZANLEITUNG.md
@ -1,204 +0,0 @@
|
|||||||
# LLM-Dienst – Kurzanleitung
|
|
||||||
|
|
||||||
## Worum geht es?
|
|
||||||
|
|
||||||
Der Dienst stellt **große Sprachmodelle (LLMs)** über eine einfache HTTP-API bereit, die direkt aus Python-Skripten, Jupyter-Notebooks oder eigenen Anwendungen angesprochen werden kann. Die Modelle laufen lokal auf einem GPU-Server im Intranet – ohne Datenübertragung nach außen und ohne Cloud-Kosten.
|
|
||||||
|
|
||||||
Typische Anwendungsfälle:
|
|
||||||
|
|
||||||
- Texte zusammenfassen, übersetzen oder umformulieren
|
|
||||||
- KI-gestütztes Coding (z.B. mit **[opencode](https://opencode.ai)**)
|
|
||||||
- Experimente mit Prompt-Engineering und LLM-Integration in eigene Projekte
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Zugang
|
|
||||||
|
|
||||||
Der Dienst ist **nur im Intranet (VPN)** erreichbar.
|
|
||||||
|
|
||||||
| | |
|
|
||||||
|---|---|
|
|
||||||
| **API-Endpunkt** | `http://141.75.33.244:8000` |
|
|
||||||
| **Authentifizierung** | API-Key erforderlich (per E-Mail beim Admin anfragen) |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Verfügbare Modelle
|
|
||||||
|
|
||||||
| Modell | Größe | Hinweis |
|
|
||||||
|---|---|---|
|
|
||||||
| `gemma4:e4b` | 9,6 GB | sehr schnell, für einfache Aufgaben |
|
|
||||||
| `gemma4:31b` | 19 GB | kompakt, schnell |
|
|
||||||
| `gpt-oss:20b` | 13 GB | kompakt, schnell |
|
|
||||||
| `gpt-oss:120b` | 65 GB | sehr leistungsfähig |
|
|
||||||
| `qwen3.5:122b` | 81 GB | sehr leistungsfähig |
|
|
||||||
| `qwen3-coder-next:q8_0` | 84 GB | speziell für Code |
|
|
||||||
|
|
||||||
> **Wichtig:** Es kann immer nur **ein Modell gleichzeitig** im GPU-Speicher geladen sein.
|
|
||||||
> Wechselt jemand das Modell, muss das vorherige entladen und das neue geladen werden –
|
|
||||||
> das kann **mehrere Minuten** dauern. Der erste Prompt nach einem Modellwechsel ist
|
|
||||||
> deshalb deutlich langsamer. Danach bleibt das Modell einige Zeit geladen.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Python-Beispiel – Einfacher Prompt
|
|
||||||
|
|
||||||
Das API folgt dem **OpenAI-Standard**, d.h. die `openai`-Bibliothek kann direkt verwendet werden.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pip install openai
|
|
||||||
```
|
|
||||||
|
|
||||||
```python
|
|
||||||
from openai import OpenAI
|
|
||||||
|
|
||||||
API_KEY = "sk-..." # euren API-Key eintragen
|
|
||||||
BASE_URL = "http://141.75.33.244:8000/v1"
|
|
||||||
MODEL = "gemma4:31b" # Modell nach Bedarf wählen
|
|
||||||
|
|
||||||
client = OpenAI(api_key=API_KEY, base_url=BASE_URL)
|
|
||||||
|
|
||||||
response = client.chat.completions.create(
|
|
||||||
model=MODEL,
|
|
||||||
messages=[
|
|
||||||
{"role": "user", "content": "Erkläre den Unterschied zwischen L1- und L2-Regularisierung."}
|
|
||||||
]
|
|
||||||
)
|
|
||||||
|
|
||||||
print(response.choices[0].message.content)
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Python-Beispiel – Verfügbare Modelle abfragen
|
|
||||||
|
|
||||||
```python
|
|
||||||
from openai import OpenAI
|
|
||||||
|
|
||||||
API_KEY = "sk-..."
|
|
||||||
BASE_URL = "http://141.75.33.244:8000/v1"
|
|
||||||
|
|
||||||
client = OpenAI(api_key=API_KEY, base_url=BASE_URL)
|
|
||||||
|
|
||||||
models = client.models.list()
|
|
||||||
for m in models.data:
|
|
||||||
print(m.id)
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Aktuell geladenes Modell abfragen
|
|
||||||
|
|
||||||
Da immer nur ein Modell gleichzeitig im Speicher sein kann, lässt sich mit folgendem Aufruf prüfen, welches Modell gerade aktiv ist:
|
|
||||||
|
|
||||||
```python
|
|
||||||
import httpx
|
|
||||||
|
|
||||||
r = httpx.get(
|
|
||||||
"http://141.75.33.244:8000/api/ps",
|
|
||||||
headers={"Authorization": "Bearer sk-..."}
|
|
||||||
)
|
|
||||||
print(r.json())
|
|
||||||
```
|
|
||||||
|
|
||||||
Die Antwort enthält Modellname, Größe und wie lange das Modell noch im Speicher bleibt.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Empfehlungen zur Nutzung
|
|
||||||
|
|
||||||
- **Kleines Modell zuerst** (`gemma4:31b` oder `gpt-oss:20b`) – viel schneller, für viele Aufgaben ausreichend.
|
|
||||||
- **Großes Modell** nur bei komplexen Aufgaben (`qwen3.5:122b`, `gpt-oss:120b`).
|
|
||||||
- **Code-Aufgaben**: `qwen3-coder-next:q8_0` ist speziell dafür optimiert.
|
|
||||||
- Wenn möglich, **dasselbe Modell wie andere Nutzer** verwenden, um häufige Modellwechsel zu vermeiden.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Quotas
|
|
||||||
|
|
||||||
Je nach API-Key können folgende Limits konfiguriert sein:
|
|
||||||
|
|
||||||
- Maximale **Anfragen pro Tag / Monat**
|
|
||||||
- Maximale **Tokens pro Tag / Monat**
|
|
||||||
|
|
||||||
Bei Überschreitung gibt die API den Statuscode `429 Too Many Requests` zurück.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Coding-Assistent: opencode
|
|
||||||
|
|
||||||
[opencode](https://opencode.ai) ist ein terminal-basierter KI-Coding-Agent (ähnlich Claude Code), der OpenAI-kompatible APIs unterstützt und damit direkt auf den Intranet-Dienst zeigen kann.
|
|
||||||
|
|
||||||
### Installation
|
|
||||||
|
|
||||||
```bash
|
|
||||||
npm install -g opencode-ai
|
|
||||||
# oder
|
|
||||||
curl -fsSL https://opencode.ai/install | bash
|
|
||||||
```
|
|
||||||
|
|
||||||
### Konfiguration
|
|
||||||
|
|
||||||
Konfigurationsdatei anlegen unter `~/.config/opencode/config.json`:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"$schema": "https://opencode.ai/config.json",
|
|
||||||
"providers": {
|
|
||||||
"openai": {
|
|
||||||
"apiKey": "sk-...",
|
|
||||||
"baseURL": "http://141.75.33.244:8000/v1"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"model": "openai/qwen3-coder-next:q8_0"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Für Code-Aufgaben empfiehlt sich `qwen3-coder-next:q8_0`, für allgemeine Aufgaben `gemma4:31b` oder `gpt-oss:20b`.
|
|
||||||
|
|
||||||
### Starten
|
|
||||||
|
|
||||||
```bash
|
|
||||||
opencode
|
|
||||||
```
|
|
||||||
|
|
||||||
opencode öffnet eine interaktive Terminal-Oberfläche und kann dann im Projektverzeichnis eingesetzt werden – Dateien lesen, Code generieren, Refactoring vorschlagen usw.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Coding-Assistent: Claude Code
|
|
||||||
|
|
||||||
[Claude Code](https://claude.ai/code) ist Anthropics offizieller KI-Coding-Agent für das Terminal. Wer bereits einen Claude-Code-Zugang hat, kann ihn über den Intranet-Dienst mit lokalen Modellen betreiben — ohne Daten an Anthropic zu übertragen.
|
|
||||||
|
|
||||||
### Voraussetzung
|
|
||||||
|
|
||||||
Ein aktiver Claude-Code-Zugang (Claude Pro oder Team).
|
|
||||||
|
|
||||||
### Starten
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ANTHROPIC_BASE_URL=http://141.75.33.244:8000 \
|
|
||||||
ANTHROPIC_AUTH_TOKEN=sk-... \
|
|
||||||
claude
|
|
||||||
```
|
|
||||||
|
|
||||||
Das zu verwendende Modell wird vom Admin über `ANTHROPIC_DEFAULT_MODEL` vorkonfiguriert — eine manuelle Modellauswahl ist nicht nötig.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Administration (nur für Admins)
|
|
||||||
|
|
||||||
Das Web-Interface zur Verwaltung von API-Keys und Quotas ist erreichbar unter:
|
|
||||||
|
|
||||||
**`http://141.75.33.244:8001`**
|
|
||||||
|
|
||||||
Dort können API-Keys angelegt, deaktiviert und mit Quotas versehen werden.
|
|
||||||
|
|
||||||
### Modell-Lock für Praktika
|
|
||||||
|
|
||||||
Unter **Einstellungen → Aktives Modell (Lock)** kann ein Modell fest vorgegeben werden. Ist ein Lock gesetzt, wird das `model`-Feld in jedem Request durch dieses Modell ersetzt – unabhängig davon, was der Client schickt. Das verhindert unkoordinierte Modellwechsel während einer Veranstaltung, die alle Teilnehmenden durch lange Ladezeiten ausbremsen würden.
|
|
||||||
|
|
||||||
Typischer Ablauf für ein Praktikum:
|
|
||||||
1. Vor der Veranstaltung: passendes Modell in Ollama laden
|
|
||||||
2. Lock in der Admin-Oberfläche aktivieren
|
|
||||||
3. Nach der Veranstaltung: Lock wieder deaktivieren (Feld leeren)
|
|
||||||
27
LICENSE
27
LICENSE
@ -1,27 +0,0 @@
|
|||||||
MIT License
|
|
||||||
|
|
||||||
Copyright (c) 2026 Oliver Hofmann
|
|
||||||
|
|
||||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
||||||
of this software and associated documentation files (the "Software"), to deal
|
|
||||||
in the Software without restriction, including without limitation the rights
|
|
||||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
||||||
copies of the Software, and to permit persons to whom the Software is
|
|
||||||
furnished to do so, subject to the following conditions:
|
|
||||||
|
|
||||||
The above copyright notice and this permission notice shall be included in all
|
|
||||||
copies or substantial portions of the Software.
|
|
||||||
|
|
||||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
||||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
||||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
||||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
||||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
||||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
||||||
SOFTWARE.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
Portions of this software were inspired by free-claude-code
|
|
||||||
(https://github.com/Alishahryar1/free-claude-code),
|
|
||||||
copyright (c) 2026 Ali Khokhar, MIT License.
|
|
||||||
48
README.md
48
README.md
@ -4,13 +4,12 @@ Ollama bietet von sich aus keine Authentifizierung — wer die API erreicht, kan
|
|||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
- API-Key-Authentifizierung (Bearer Token, `sk-`-Prefix, `x-api-key`- und `anthropic-auth-token`-Header)
|
- API-Key-Authentifizierung (Bearer Token oder `sk-`-Prefix)
|
||||||
- Optionales Ablaufdatum pro API-Key
|
- Optionales Ablaufdatum pro API-Key
|
||||||
- Quota-Management mit getrennten Tages- und Monatslimits (Tokens & Requests)
|
- Quota-Management mit getrennten Tages- und Monatslimits (Tokens & Requests)
|
||||||
- Token-Zählung via tiktoken, Reset-Grenzen in der konfigurierten Zeitzone
|
- Token-Zählung via tiktoken, Reset-Grenzen in der Zeitzone Europe/Berlin
|
||||||
- Web-Admin-Oberfläche (API-Keys verwalten, Ollama-Einstellungen, Verbrauchsanzeige)
|
- Web-Admin-Oberfläche (API-Keys verwalten, Ollama-Einstellungen, Verbrauchsanzeige)
|
||||||
- OpenAI-kompatibler `/v1/chat/completions`-Endpunkt mit Streaming und Tool-Use
|
- OpenAI-kompatibler `/v1/chat/completions`-Endpunkt mit Streaming und Tool-Use
|
||||||
- Anthropic Messages API `/v1/messages` — kompatibel mit Claude Code CLI und Anthropic-SDK-Clients
|
|
||||||
- Rotierende Nutzungs-Logs
|
- Rotierende Nutzungs-Logs
|
||||||
- SQLite (Standard) oder PostgreSQL
|
- SQLite (Standard) oder PostgreSQL
|
||||||
- Docker-Image auf DockerHub: `mediaeng/llmproxy`
|
- Docker-Image auf DockerHub: `mediaeng/llmproxy`
|
||||||
@ -34,9 +33,9 @@ ADMIN_HOST=0.0.0.0
|
|||||||
ADMIN_PORT=8001
|
ADMIN_PORT=8001
|
||||||
DATABASE_URL=sqlite:///./test.db
|
DATABASE_URL=sqlite:///./test.db
|
||||||
OLLAMA_URL=http://localhost:11434
|
OLLAMA_URL=http://localhost:11434
|
||||||
|
DEFAULT_MODEL=llama3
|
||||||
APP_TZ=Europe/Berlin
|
APP_TZ=Europe/Berlin
|
||||||
LOG_FILE=logs/usage.log
|
LOG_FILE=logs/usage.log
|
||||||
ANTHROPIC_DEFAULT_MODEL=llama3
|
|
||||||
```
|
```
|
||||||
|
|
||||||
| Variable | Standard | Beschreibung |
|
| Variable | Standard | Beschreibung |
|
||||||
@ -48,10 +47,10 @@ ANTHROPIC_DEFAULT_MODEL=llama3
|
|||||||
| `ADMIN_PORT` | `8001` | Port der Admin-API |
|
| `ADMIN_PORT` | `8001` | Port der Admin-API |
|
||||||
| `DATABASE_URL` | `sqlite:///./test.db` | DB-Verbindungsstring (SQLite oder PostgreSQL) |
|
| `DATABASE_URL` | `sqlite:///./test.db` | DB-Verbindungsstring (SQLite oder PostgreSQL) |
|
||||||
| `OLLAMA_URL` | `http://localhost:11434` | Adresse der Ollama-Instanz (auch in der UI änderbar) |
|
| `OLLAMA_URL` | `http://localhost:11434` | Adresse der Ollama-Instanz (auch in der UI änderbar) |
|
||||||
|
| `DEFAULT_MODEL` | `llama3` | Standard-Modell für `/v1/chat/completions` (auch in der UI änderbar) |
|
||||||
| `APP_TZ` | `Europe/Berlin` | Zeitzone für tägliche/monatliche Quota-Resets |
|
| `APP_TZ` | `Europe/Berlin` | Zeitzone für tägliche/monatliche Quota-Resets |
|
||||||
| `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei |
|
| `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei |
|
||||||
| `ALLOWED_ORIGINS` | `http://localhost:5173` | CORS-Origins (nur für Entwicklung relevant) |
|
| `ALLOWED_ORIGINS` | `http://localhost:5173` | CORS-Origins (nur für Entwicklung relevant) |
|
||||||
| `ANTHROPIC_DEFAULT_MODEL` | — | Standard-Modell für `/v1/messages` (Ollama-Modellname) |
|
|
||||||
|
|
||||||
## Entwicklung (lokal)
|
## Entwicklung (lokal)
|
||||||
|
|
||||||
@ -81,23 +80,6 @@ Das Script prüft alle Ports auf Belegung, initialisiert die Datenbank und start
|
|||||||
|
|
||||||
Admin-Oberfläche: `http://localhost:5173`
|
Admin-Oberfläche: `http://localhost:5173`
|
||||||
|
|
||||||
## Claude Code CLI
|
|
||||||
|
|
||||||
Der Proxy stellt einen Anthropic-kompatiblen Endpunkt bereit, über den Claude Code CLI mit lokalen Ollama-Modellen genutzt werden kann.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# ANTHROPIC_DEFAULT_MODEL in .env setzen, dann:
|
|
||||||
./start_claude.sh
|
|
||||||
|
|
||||||
# Oder mit Key als Argument:
|
|
||||||
./start_claude.sh sk-dein-api-key
|
|
||||||
|
|
||||||
# Oder als Umgebungsvariable:
|
|
||||||
PROXY_API_KEY=sk-dein-api-key ./start_claude.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
Das Script setzt `ANTHROPIC_BASE_URL` und `ANTHROPIC_AUTH_TOKEN` automatisch aus der `.env` und startet `claude`.
|
|
||||||
|
|
||||||
## Produktion (Docker)
|
## Produktion (Docker)
|
||||||
|
|
||||||
### Docker Compose (empfohlen)
|
### Docker Compose (empfohlen)
|
||||||
@ -188,26 +170,17 @@ Clients konfigurieren dann `https://llm.example.com/v1` als Base URL.
|
|||||||
|
|
||||||
## Proxy-Endpunkte (Port 8000)
|
## Proxy-Endpunkte (Port 8000)
|
||||||
|
|
||||||
Alle Endpunkte erfordern einen gültigen API-Key im `Authorization`-Header (`Bearer sk-...`), im `x-api-key`-Header oder im `anthropic-auth-token`-Header.
|
Alle Endpunkte erfordern einen gültigen API-Key im `Authorization`-Header.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# OpenAI-kompatibler Endpunkt
|
|
||||||
curl -X POST http://localhost:8000/v1/chat/completions \
|
curl -X POST http://localhost:8000/v1/chat/completions \
|
||||||
-H "Authorization: Bearer sk-xxxxxx" \
|
-H "Authorization: Bearer sk-xxxxxx" \
|
||||||
-H "Content-Type: application/json" \
|
-H "Content-Type: application/json" \
|
||||||
-d '{"model":"llama3","messages":[{"role":"user","content":"Hallo"}]}'
|
-d '{"model":"llama3","messages":[{"role":"user","content":"Hallo"}]}'
|
||||||
|
|
||||||
# Anthropic-kompatibler Endpunkt (z. B. für Claude Code)
|
|
||||||
curl -X POST http://localhost:8000/v1/messages \
|
|
||||||
-H "x-api-key: sk-xxxxxx" \
|
|
||||||
-H "anthropic-version: 2023-06-01" \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{"model":"llama3","messages":[{"role":"user","content":"Hallo"}],"max_tokens":1024}'
|
|
||||||
```
|
```
|
||||||
|
|
||||||
| Endpunkt | Methode | Beschreibung |
|
| Endpunkt | Methode | Beschreibung |
|
||||||
|----------|---------|--------------|
|
|----------|---------|--------------|
|
||||||
| `/v1/messages` | POST | Chat (Anthropic-Format, Streaming + Tool-Use) |
|
|
||||||
| `/v1/chat/completions` | POST | Chat (OpenAI-Format, Streaming + Tool-Use) |
|
| `/v1/chat/completions` | POST | Chat (OpenAI-Format, Streaming + Tool-Use) |
|
||||||
| `/v1/models` | GET | Modelle (OpenAI-Format) |
|
| `/v1/models` | GET | Modelle (OpenAI-Format) |
|
||||||
| `/api/generate` | POST | Ollama generate (nativ) |
|
| `/api/generate` | POST | Ollama generate (nativ) |
|
||||||
@ -255,8 +228,7 @@ llm_quota/
|
|||||||
│ └── tests/
|
│ └── tests/
|
||||||
│ ├── conftest.py
|
│ ├── conftest.py
|
||||||
│ ├── test_auth.py
|
│ ├── test_auth.py
|
||||||
│ ├── test_quota.py
|
│ └── test_quota.py
|
||||||
│ └── test_anthropic_messages.py
|
|
||||||
├── frontend/
|
├── frontend/
|
||||||
│ └── src/
|
│ └── src/
|
||||||
│ ├── main.jsx # React-Admin-UI
|
│ ├── main.jsx # React-Admin-UI
|
||||||
@ -268,19 +240,13 @@ llm_quota/
|
|||||||
├── docker-entrypoint.sh
|
├── docker-entrypoint.sh
|
||||||
├── .dockerignore
|
├── .dockerignore
|
||||||
├── start.sh # Entwicklungs-Startscript
|
├── start.sh # Entwicklungs-Startscript
|
||||||
├── start_claude.sh # Claude Code CLI mit Proxy starten
|
|
||||||
├── run_dev.py # Entwicklungs-Runner für PyCharm
|
├── run_dev.py # Entwicklungs-Runner für PyCharm
|
||||||
├── build_push.sh # Docker-Build & Push zu DockerHub
|
├── build_push.sh # Docker-Build & Push zu DockerHub
|
||||||
├── LICENSE
|
|
||||||
├── DOCKERHUB.md # DockerHub-Beschreibung (deutsch)
|
├── DOCKERHUB.md # DockerHub-Beschreibung (deutsch)
|
||||||
├── DOCKERHUB.en.md # DockerHub-Beschreibung (englisch)
|
├── DOCKERHUB.en.md # DockerHub-Beschreibung (englisch)
|
||||||
└── .gitignore
|
└── .gitignore
|
||||||
```
|
```
|
||||||
|
|
||||||
## Danksagung
|
|
||||||
|
|
||||||
Der Anthropic-kompatible Endpunkt (`/v1/messages`) wurde durch das Projekt [free-claude-code](https://github.com/Alishahryar1/free-claude-code) von Ali Khokhar inspiriert, das einen ähnlichen Ansatz für das Weiterleiten von Claude-Code-Anfragen an alternative LLM-Backends verfolgt.
|
|
||||||
|
|
||||||
## Lizenz
|
## Lizenz
|
||||||
|
|
||||||
MIT — siehe [LICENSE](LICENSE)
|
MIT
|
||||||
|
|||||||
@ -131,16 +131,13 @@ async def get_proxy_info(_ = Depends(require_admin_auth)):
|
|||||||
host = os.getenv("PROXY_HOST", "0.0.0.0")
|
host = os.getenv("PROXY_HOST", "0.0.0.0")
|
||||||
port = os.getenv("PROXY_PORT", "8000")
|
port = os.getenv("PROXY_PORT", "8000")
|
||||||
display_host = "localhost" if host in ("0.0.0.0", "::") else host
|
display_host = "localhost" if host in ("0.0.0.0", "::") else host
|
||||||
return {
|
return {"endpoint": f"http://{display_host}:{port}"}
|
||||||
"endpoint": f"http://{display_host}:{port}",
|
|
||||||
"version": os.getenv("APP_VERSION", "dev"),
|
|
||||||
}
|
|
||||||
|
|
||||||
@app.get("/api/settings", response_model=schemas.Settings)
|
@app.get("/api/settings", response_model=schemas.Settings)
|
||||||
async def read_settings(db: Session = Depends(get_db), _ = Depends(require_admin_auth)):
|
async def read_settings(db: Session = Depends(get_db), _ = Depends(require_admin_auth)):
|
||||||
return schemas.Settings(
|
return schemas.Settings(
|
||||||
ollama_url=crud.get_setting(db, "ollama_url", "http://localhost:11434"),
|
ollama_url=crud.get_setting(db, "ollama_url", "http://localhost:11434"),
|
||||||
force_model=crud.get_setting(db, "force_model") or None,
|
default_model=crud.get_setting(db, "default_model", "llama3"),
|
||||||
)
|
)
|
||||||
|
|
||||||
@app.put("/api/settings", response_model=schemas.Settings)
|
@app.put("/api/settings", response_model=schemas.Settings)
|
||||||
@ -151,8 +148,8 @@ async def update_settings(
|
|||||||
):
|
):
|
||||||
ollama_url = settings.ollama_url.rstrip('/').removesuffix('/v1')
|
ollama_url = settings.ollama_url.rstrip('/').removesuffix('/v1')
|
||||||
crud.set_setting(db, "ollama_url", ollama_url)
|
crud.set_setting(db, "ollama_url", ollama_url)
|
||||||
crud.set_setting(db, "force_model", settings.force_model or "")
|
crud.set_setting(db, "default_model", settings.default_model)
|
||||||
return schemas.Settings(ollama_url=ollama_url, force_model=settings.force_model or None)
|
return schemas.Settings(ollama_url=ollama_url, default_model=settings.default_model)
|
||||||
|
|
||||||
@app.get("/api/ollama-models")
|
@app.get("/api/ollama-models")
|
||||||
async def get_ollama_models(
|
async def get_ollama_models(
|
||||||
@ -169,18 +166,6 @@ async def get_ollama_models(
|
|||||||
except Exception:
|
except Exception:
|
||||||
return {"models": [], "reachable": False}
|
return {"models": [], "reachable": False}
|
||||||
|
|
||||||
@app.get("/api/logs/{name}")
|
|
||||||
async def get_log_lines(name: str, _ = Depends(require_admin_auth)):
|
|
||||||
if name not in ("usage", "error"):
|
|
||||||
raise HTTPException(status_code=400, detail="name must be 'usage' or 'error'")
|
|
||||||
log_file = Path(os.getenv("LOG_FILE", "logs/usage.log"))
|
|
||||||
path = log_file if name == "usage" else log_file.parent / "error.log"
|
|
||||||
try:
|
|
||||||
lines = path.read_text(encoding="utf-8").splitlines()
|
|
||||||
return {"lines": lines[-10:]}
|
|
||||||
except FileNotFoundError:
|
|
||||||
return {"lines": []}
|
|
||||||
|
|
||||||
# Statisches Frontend ausliefern (nur im Produktivbetrieb, wenn dist/ existiert)
|
# Statisches Frontend ausliefern (nur im Produktivbetrieb, wenn dist/ existiert)
|
||||||
_dist = Path(__file__).parent.parent / "frontend" / "dist"
|
_dist = Path(__file__).parent.parent / "frontend" / "dist"
|
||||||
if _dist.exists():
|
if _dist.exists():
|
||||||
|
|||||||
@ -1,20 +1,12 @@
|
|||||||
import os
|
import os
|
||||||
from pathlib import Path
|
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from sqlalchemy import create_engine
|
from sqlalchemy import create_engine
|
||||||
|
|
||||||
load_dotenv(dotenv_path=Path(__file__).resolve().parent.parent / ".env")
|
load_dotenv(dotenv_path=os.path.join(os.path.dirname(__file__), '..', '.env'))
|
||||||
from sqlalchemy.orm import sessionmaker, declarative_base
|
from sqlalchemy.orm import sessionmaker, declarative_base
|
||||||
|
|
||||||
DATABASE_URL = os.environ.get("DATABASE_URL", "sqlite:///./test.db")
|
DATABASE_URL = os.environ.get("DATABASE_URL", "sqlite:///./test.db")
|
||||||
|
|
||||||
# Relative SQLite-Pfade immer relativ zu dieser Datei auflösen, nicht zum cwd
|
|
||||||
if DATABASE_URL.startswith("sqlite:///") and not DATABASE_URL.startswith("sqlite:////"):
|
|
||||||
db_path = DATABASE_URL[len("sqlite:///"):]
|
|
||||||
if not os.path.isabs(db_path):
|
|
||||||
db_path = str(Path(__file__).resolve().parent / db_path)
|
|
||||||
DATABASE_URL = f"sqlite:///{db_path}"
|
|
||||||
|
|
||||||
if "sqlite" in DATABASE_URL:
|
if "sqlite" in DATABASE_URL:
|
||||||
engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
|
engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
|
||||||
else:
|
else:
|
||||||
|
|||||||
@ -13,6 +13,8 @@ def init_db():
|
|||||||
db = SessionLocal()
|
db = SessionLocal()
|
||||||
if not get_setting(db, "ollama_url"):
|
if not get_setting(db, "ollama_url"):
|
||||||
set_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
set_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
||||||
|
if not get_setting(db, "default_model"):
|
||||||
|
set_setting(db, "default_model", os.getenv("DEFAULT_MODEL", "llama3"))
|
||||||
db.close()
|
db.close()
|
||||||
|
|
||||||
print("Database initialized.")
|
print("Database initialized.")
|
||||||
|
|||||||
299
backend/main.py
299
backend/main.py
@ -1,8 +1,5 @@
|
|||||||
import json
|
|
||||||
import logging
|
import logging
|
||||||
import os
|
import os
|
||||||
import secrets
|
|
||||||
import time
|
|
||||||
from logging.handlers import RotatingFileHandler
|
from logging.handlers import RotatingFileHandler
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
@ -52,16 +49,10 @@ def _last_user_msg(messages: list, max_len: int = 120) -> str:
|
|||||||
|
|
||||||
async def require_api_key(request: Request, db: Session = Depends(get_db)):
|
async def require_api_key(request: Request, db: Session = Depends(get_db)):
|
||||||
auth_header = request.headers.get("Authorization", "")
|
auth_header = request.headers.get("Authorization", "")
|
||||||
x_api_key = request.headers.get("x-api-key", "")
|
|
||||||
auth_token = request.headers.get("anthropic-auth-token", "")
|
|
||||||
if auth_header.startswith("Bearer "):
|
if auth_header.startswith("Bearer "):
|
||||||
api_key = auth_header[7:]
|
api_key = auth_header[7:]
|
||||||
elif auth_header.startswith("sk-"):
|
elif auth_header.startswith("sk-"):
|
||||||
api_key = auth_header
|
api_key = auth_header
|
||||||
elif x_api_key:
|
|
||||||
api_key = x_api_key
|
|
||||||
elif auth_token:
|
|
||||||
api_key = auth_token
|
|
||||||
else:
|
else:
|
||||||
raise HTTPException(status_code=401, detail="Invalid or missing API key")
|
raise HTTPException(status_code=401, detail="Invalid or missing API key")
|
||||||
db_key = crud.verify_api_key(db, api_key)
|
db_key = crud.verify_api_key(db, api_key)
|
||||||
@ -79,6 +70,8 @@ def apply_env_settings():
|
|||||||
try:
|
try:
|
||||||
if url := os.getenv("OLLAMA_URL"):
|
if url := os.getenv("OLLAMA_URL"):
|
||||||
crud.set_setting(db, "ollama_url", url)
|
crud.set_setting(db, "ollama_url", url)
|
||||||
|
if model := os.getenv("DEFAULT_MODEL"):
|
||||||
|
crud.set_setting(db, "default_model", model)
|
||||||
db.commit()
|
db.commit()
|
||||||
finally:
|
finally:
|
||||||
db.close()
|
db.close()
|
||||||
@ -89,25 +82,15 @@ async def unhandled_exception_handler(request: Request, exc: Exception):
|
|||||||
request.method, request.url.path, type(exc).__name__, exc, exc_info=exc)
|
request.method, request.url.path, type(exc).__name__, exc, exc_info=exc)
|
||||||
return JSONResponse(status_code=500, content={"error": {"message": "Internal server error", "type": "server_error"}})
|
return JSONResponse(status_code=500, content={"error": {"message": "Internal server error", "type": "server_error"}})
|
||||||
|
|
||||||
def _backend_headers() -> dict:
|
|
||||||
key = os.getenv("BACKEND_API_KEY")
|
|
||||||
return {"Authorization": f"Bearer {key}"} if key else {}
|
|
||||||
|
|
||||||
|
|
||||||
async def proxy_request(url: str, method: str = "GET", json_data: dict = None):
|
async def proxy_request(url: str, method: str = "GET", json_data: dict = None):
|
||||||
async with httpx.AsyncClient(timeout=300.0) as client:
|
async with httpx.AsyncClient(timeout=300.0) as client:
|
||||||
response = await client.request(method=method, url=url, json=json_data, headers=_backend_headers())
|
response = await client.request(method=method, url=url, json=json_data)
|
||||||
return response
|
return response
|
||||||
|
|
||||||
@app.post("/api/generate")
|
@app.post("/api/generate")
|
||||||
async def generate(request: Request, db: Session = Depends(get_db)):
|
async def generate(request: Request, db: Session = Depends(get_db)):
|
||||||
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
||||||
body = await request.json()
|
body = await request.json()
|
||||||
force_model = crud.get_setting(db, "force_model") or None
|
|
||||||
if force_model:
|
|
||||||
body = {**body, "model": force_model}
|
|
||||||
if not body.get("model"):
|
|
||||||
raise HTTPException(status_code=422, detail="Field 'model' is required")
|
|
||||||
prompt_tokens = crud.count_tokens(body.get("prompt", ""))
|
prompt_tokens = crud.count_tokens(body.get("prompt", ""))
|
||||||
|
|
||||||
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
|
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
|
||||||
@ -116,15 +99,9 @@ async def generate(request: Request, db: Session = Depends(get_db)):
|
|||||||
prompt_preview = (body.get("prompt", "").replace("\n", " ").strip())[:120]
|
prompt_preview = (body.get("prompt", "").replace("\n", " ").strip())[:120]
|
||||||
usage_log.info('%s | /api/generate | %s | ~%d tokens | "%s"',
|
usage_log.info('%s | /api/generate | %s | ~%d tokens | "%s"',
|
||||||
request.state.api_key_name, body.get("model", "?"), prompt_tokens, prompt_preview)
|
request.state.api_key_name, body.get("model", "?"), prompt_tokens, prompt_preview)
|
||||||
start = time.monotonic()
|
|
||||||
try:
|
try:
|
||||||
response = await proxy_request(f"{ollama_url}/api/generate", method="POST", json_data=body)
|
response = await proxy_request(f"{ollama_url}/api/generate", method="POST", json_data=body)
|
||||||
resp_json = response.json()
|
return JSONResponse(content=response.json(), status_code=response.status_code)
|
||||||
usage_log.info('%s | /api/generate | %s | actual ↑%d ↓%d tokens | %.1fs',
|
|
||||||
request.state.api_key_name, body.get("model", "?"),
|
|
||||||
resp_json.get("prompt_eval_count", 0), resp_json.get("eval_count", 0),
|
|
||||||
time.monotonic() - start)
|
|
||||||
return JSONResponse(content=resp_json, status_code=response.status_code)
|
|
||||||
except Exception as exc:
|
except Exception as exc:
|
||||||
error_log.error("Proxy error | %s | /api/generate | %s | %s: %s",
|
error_log.error("Proxy error | %s | /api/generate | %s | %s: %s",
|
||||||
request.state.api_key_name, body.get("model", "?"), type(exc).__name__, exc, exc_info=exc)
|
request.state.api_key_name, body.get("model", "?"), type(exc).__name__, exc, exc_info=exc)
|
||||||
@ -134,11 +111,6 @@ async def generate(request: Request, db: Session = Depends(get_db)):
|
|||||||
async def chat(request: Request, db: Session = Depends(get_db)):
|
async def chat(request: Request, db: Session = Depends(get_db)):
|
||||||
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
||||||
body = await request.json()
|
body = await request.json()
|
||||||
force_model = crud.get_setting(db, "force_model") or None
|
|
||||||
if force_model:
|
|
||||||
body = {**body, "model": force_model}
|
|
||||||
if not body.get("model"):
|
|
||||||
raise HTTPException(status_code=422, detail="Field 'model' is required")
|
|
||||||
messages = body.get("messages", [])
|
messages = body.get("messages", [])
|
||||||
prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages)
|
prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages)
|
||||||
|
|
||||||
@ -147,15 +119,9 @@ async def chat(request: Request, db: Session = Depends(get_db)):
|
|||||||
|
|
||||||
usage_log.info('%s | /api/chat | %s | ~%d tokens | "%s"',
|
usage_log.info('%s | /api/chat | %s | ~%d tokens | "%s"',
|
||||||
request.state.api_key_name, body.get("model", "?"), prompt_tokens, _last_user_msg(messages))
|
request.state.api_key_name, body.get("model", "?"), prompt_tokens, _last_user_msg(messages))
|
||||||
start = time.monotonic()
|
|
||||||
try:
|
try:
|
||||||
response = await proxy_request(f"{ollama_url}/api/chat", method="POST", json_data=body)
|
response = await proxy_request(f"{ollama_url}/api/chat", method="POST", json_data=body)
|
||||||
resp_json = response.json()
|
return JSONResponse(content=response.json(), status_code=response.status_code)
|
||||||
usage_log.info('%s | /api/chat | %s | actual ↑%d ↓%d tokens | %.1fs',
|
|
||||||
request.state.api_key_name, body.get("model", "?"),
|
|
||||||
resp_json.get("prompt_eval_count", 0), resp_json.get("eval_count", 0),
|
|
||||||
time.monotonic() - start)
|
|
||||||
return JSONResponse(content=resp_json, status_code=response.status_code)
|
|
||||||
except Exception as exc:
|
except Exception as exc:
|
||||||
error_log.error("Proxy error | %s | /api/chat | %s | %s: %s",
|
error_log.error("Proxy error | %s | /api/chat | %s | %s: %s",
|
||||||
request.state.api_key_name, body.get("model", "?"), type(exc).__name__, exc, exc_info=exc)
|
request.state.api_key_name, body.get("model", "?"), type(exc).__name__, exc, exc_info=exc)
|
||||||
@ -167,226 +133,12 @@ async def list_models(db: Session = Depends(get_db)):
|
|||||||
response = await proxy_request(f"{ollama_url}/api/tags", method="GET")
|
response = await proxy_request(f"{ollama_url}/api/tags", method="GET")
|
||||||
return JSONResponse(content=response.json(), status_code=response.status_code)
|
return JSONResponse(content=response.json(), status_code=response.status_code)
|
||||||
|
|
||||||
@app.get("/version")
|
|
||||||
async def version():
|
|
||||||
return {"version": os.getenv("APP_VERSION", "dev")}
|
|
||||||
|
|
||||||
@app.get("/api/ps")
|
|
||||||
async def running_models(db: Session = Depends(get_db)):
|
|
||||||
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
|
||||||
response = await proxy_request(f"{ollama_url}/api/ps", method="GET")
|
|
||||||
return JSONResponse(content=response.json(), status_code=response.status_code)
|
|
||||||
|
|
||||||
@app.get("/api/versions")
|
@app.get("/api/versions")
|
||||||
async def versions(db: Session = Depends(get_db)):
|
async def versions(db: Session = Depends(get_db)):
|
||||||
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
||||||
response = await proxy_request(f"{ollama_url}/api/versions", method="GET")
|
response = await proxy_request(f"{ollama_url}/api/versions", method="GET")
|
||||||
return JSONResponse(content=response.json(), status_code=response.status_code)
|
return JSONResponse(content=response.json(), status_code=response.status_code)
|
||||||
|
|
||||||
|
|
||||||
# --- Anthropic Messages API compatibility layer ---
|
|
||||||
|
|
||||||
def _anthropic_content_to_str(content) -> str:
|
|
||||||
"""Flatten Anthropic content (string or block array) to a plain string."""
|
|
||||||
if isinstance(content, str):
|
|
||||||
return content
|
|
||||||
if isinstance(content, list):
|
|
||||||
parts = []
|
|
||||||
for block in content:
|
|
||||||
if not isinstance(block, dict):
|
|
||||||
continue
|
|
||||||
if block.get("type") == "text":
|
|
||||||
parts.append(block.get("text", ""))
|
|
||||||
elif block.get("type") == "tool_result":
|
|
||||||
raw = block.get("content", "")
|
|
||||||
if isinstance(raw, list):
|
|
||||||
raw = " ".join(r.get("text", "") for r in raw if isinstance(r, dict) and r.get("type") == "text")
|
|
||||||
parts.append(str(raw))
|
|
||||||
return " ".join(parts)
|
|
||||||
return str(content) if content else ""
|
|
||||||
|
|
||||||
|
|
||||||
def _anthropic_messages_to_ollama(messages: list, system: str = None) -> list:
|
|
||||||
"""Transform Anthropic messages array to Ollama /api/chat format."""
|
|
||||||
result = []
|
|
||||||
if system:
|
|
||||||
result.append({"role": "system", "content": system})
|
|
||||||
for msg in messages:
|
|
||||||
role = msg.get("role")
|
|
||||||
content = msg.get("content")
|
|
||||||
if role == "assistant" and isinstance(content, list):
|
|
||||||
text = " ".join(b.get("text", "") for b in content if isinstance(b, dict) and b.get("type") == "text")
|
|
||||||
tool_calls = [
|
|
||||||
{"function": {"name": b["name"], "arguments": b.get("input", {})}}
|
|
||||||
for b in content if isinstance(b, dict) and b.get("type") == "tool_use"
|
|
||||||
]
|
|
||||||
entry = {"role": "assistant", "content": text}
|
|
||||||
if tool_calls:
|
|
||||||
entry["tool_calls"] = tool_calls
|
|
||||||
result.append(entry)
|
|
||||||
elif role == "user" and isinstance(content, list):
|
|
||||||
text_parts = []
|
|
||||||
for block in content:
|
|
||||||
if not isinstance(block, dict):
|
|
||||||
continue
|
|
||||||
if block.get("type") == "tool_result":
|
|
||||||
if text_parts:
|
|
||||||
result.append({"role": "user", "content": " ".join(text_parts)})
|
|
||||||
text_parts = []
|
|
||||||
raw = block.get("content", "")
|
|
||||||
if isinstance(raw, list):
|
|
||||||
raw = " ".join(r.get("text", "") for r in raw if isinstance(r, dict) and r.get("type") == "text")
|
|
||||||
result.append({"role": "tool", "content": str(raw)})
|
|
||||||
elif block.get("type") == "text":
|
|
||||||
text_parts.append(block.get("text", ""))
|
|
||||||
if text_parts:
|
|
||||||
result.append({"role": "user", "content": " ".join(text_parts)})
|
|
||||||
else:
|
|
||||||
result.append({"role": role, "content": _anthropic_content_to_str(content)})
|
|
||||||
return result
|
|
||||||
|
|
||||||
|
|
||||||
def _anthropic_tools_to_ollama(tools: list) -> list:
|
|
||||||
"""Transform Anthropic tools to Ollama/OpenAI function format."""
|
|
||||||
return [
|
|
||||||
{
|
|
||||||
"type": "function",
|
|
||||||
"function": {
|
|
||||||
"name": t["name"],
|
|
||||||
"description": t.get("description", ""),
|
|
||||||
"parameters": t.get("input_schema", {}),
|
|
||||||
},
|
|
||||||
}
|
|
||||||
for t in tools
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
def _ollama_to_anthropic_response(ollama_resp: dict, model_name: str, msg_id: str) -> dict:
|
|
||||||
"""Transform an Ollama /api/chat response to Anthropic Messages API format."""
|
|
||||||
msg = ollama_resp.get("message", {})
|
|
||||||
text = msg.get("content", "")
|
|
||||||
tool_calls = msg.get("tool_calls") or []
|
|
||||||
|
|
||||||
content_blocks = []
|
|
||||||
if text:
|
|
||||||
content_blocks.append({"type": "text", "text": text})
|
|
||||||
|
|
||||||
stop_reason = "end_turn"
|
|
||||||
for i, tc in enumerate(tool_calls):
|
|
||||||
stop_reason = "tool_use"
|
|
||||||
fn = tc.get("function", {})
|
|
||||||
args = fn.get("arguments", {})
|
|
||||||
if isinstance(args, str):
|
|
||||||
try:
|
|
||||||
args = json.loads(args)
|
|
||||||
except json.JSONDecodeError:
|
|
||||||
args = {}
|
|
||||||
content_blocks.append({
|
|
||||||
"type": "tool_use",
|
|
||||||
"id": f"toolu_{msg_id}_{i}",
|
|
||||||
"name": fn.get("name", ""),
|
|
||||||
"input": args,
|
|
||||||
})
|
|
||||||
|
|
||||||
return {
|
|
||||||
"id": f"msg_{msg_id}",
|
|
||||||
"type": "message",
|
|
||||||
"role": "assistant",
|
|
||||||
"content": content_blocks,
|
|
||||||
"model": model_name,
|
|
||||||
"stop_reason": stop_reason,
|
|
||||||
"stop_sequence": None,
|
|
||||||
"usage": {
|
|
||||||
"input_tokens": ollama_resp.get("prompt_eval_count", 0),
|
|
||||||
"output_tokens": ollama_resp.get("eval_count", 0),
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@app.post("/v1/messages")
|
|
||||||
async def anthropic_messages(request: Request, db: Session = Depends(get_db)):
|
|
||||||
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
|
||||||
body = await request.json()
|
|
||||||
|
|
||||||
force_model = crud.get_setting(db, "force_model") or None
|
|
||||||
model_name = force_model or os.getenv("ANTHROPIC_DEFAULT_MODEL") or body.get("model")
|
|
||||||
if not model_name:
|
|
||||||
raise HTTPException(status_code=422, detail="Field 'model' is required")
|
|
||||||
|
|
||||||
anthropic_msgs = body.get("messages", [])
|
|
||||||
system = body.get("system")
|
|
||||||
|
|
||||||
system_str = _anthropic_content_to_str(system) if system else ""
|
|
||||||
all_text = system_str + " ".join(_anthropic_content_to_str(m.get("content")) for m in anthropic_msgs)
|
|
||||||
prompt_tokens = crud.count_tokens(all_text)
|
|
||||||
|
|
||||||
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
|
|
||||||
raise HTTPException(status_code=429, detail="Quota exceeded")
|
|
||||||
|
|
||||||
ollama_messages = _anthropic_messages_to_ollama(anthropic_msgs, system=system_str)
|
|
||||||
ollama_body: dict = {"model": model_name, "messages": ollama_messages, "stream": body.get("stream", False)}
|
|
||||||
if tools := body.get("tools"):
|
|
||||||
ollama_body["tools"] = _anthropic_tools_to_ollama(tools)
|
|
||||||
|
|
||||||
msg_id = secrets.token_hex(12)
|
|
||||||
target = f"{ollama_url}/api/chat"
|
|
||||||
|
|
||||||
usage_log.info('%s | /v1/messages | %s | ~%d tokens | "%s"',
|
|
||||||
request.state.api_key_name, model_name, prompt_tokens, _last_user_msg(ollama_messages))
|
|
||||||
start = time.monotonic()
|
|
||||||
|
|
||||||
if body.get("stream"):
|
|
||||||
# Backend wird immer non-streaming aufgerufen; der Dev-Proxy baut SSE selbst auf.
|
|
||||||
# Das ist nötig, weil vorgelagerte Proxys (z.B. Produktiv-Proxy) /api/chat
|
|
||||||
# nur non-streaming exponieren.
|
|
||||||
non_stream_body = {**ollama_body, "stream": False}
|
|
||||||
|
|
||||||
async def generate():
|
|
||||||
try:
|
|
||||||
response = await proxy_request(target, method="POST", json_data=non_stream_body)
|
|
||||||
ollama_resp = response.json()
|
|
||||||
except Exception as exc:
|
|
||||||
error_log.error("Stream error | %s | /v1/messages | %s | %s: %s",
|
|
||||||
request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
|
|
||||||
raise
|
|
||||||
|
|
||||||
msg = ollama_resp.get("message", {})
|
|
||||||
text = msg.get("content", "")
|
|
||||||
input_tokens = ollama_resp.get("prompt_eval_count", 0)
|
|
||||||
output_tokens = ollama_resp.get("eval_count", 0)
|
|
||||||
|
|
||||||
yield f"event: message_start\ndata: {json.dumps({'type': 'message_start', 'message': {'id': f'msg_{msg_id}', 'type': 'message', 'role': 'assistant', 'content': [], 'model': model_name, 'stop_reason': None, 'stop_sequence': None, 'usage': {'input_tokens': input_tokens, 'output_tokens': 0}}})}\n\n"
|
|
||||||
yield f"event: content_block_start\ndata: {json.dumps({'type': 'content_block_start', 'index': 0, 'content_block': {'type': 'text', 'text': ''}})}\n\n"
|
|
||||||
yield f"event: ping\ndata: {json.dumps({'type': 'ping'})}\n\n"
|
|
||||||
if text:
|
|
||||||
yield f"event: content_block_delta\ndata: {json.dumps({'type': 'content_block_delta', 'index': 0, 'delta': {'type': 'text_delta', 'text': text}})}\n\n"
|
|
||||||
yield f"event: content_block_stop\ndata: {json.dumps({'type': 'content_block_stop', 'index': 0})}\n\n"
|
|
||||||
yield f"event: message_delta\ndata: {json.dumps({'type': 'message_delta', 'delta': {'stop_reason': 'end_turn', 'stop_sequence': None}, 'usage': {'output_tokens': output_tokens}})}\n\n"
|
|
||||||
yield f"event: message_stop\ndata: {json.dumps({'type': 'message_stop'})}\n\n"
|
|
||||||
usage_log.info('%s | /v1/messages | %s | actual ↑%d ↓%d tokens | %.1fs',
|
|
||||||
request.state.api_key_name, model_name,
|
|
||||||
input_tokens, output_tokens,
|
|
||||||
time.monotonic() - start)
|
|
||||||
|
|
||||||
return StreamingResponse(
|
|
||||||
generate(),
|
|
||||||
media_type="text/event-stream",
|
|
||||||
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
|
|
||||||
)
|
|
||||||
|
|
||||||
try:
|
|
||||||
response = await proxy_request(target, method="POST", json_data=ollama_body)
|
|
||||||
result = _ollama_to_anthropic_response(response.json(), model_name, msg_id)
|
|
||||||
usage_log.info('%s | /v1/messages | %s | actual ↑%d ↓%d tokens | %.1fs',
|
|
||||||
request.state.api_key_name, model_name,
|
|
||||||
result["usage"]["input_tokens"], result["usage"]["output_tokens"],
|
|
||||||
time.monotonic() - start)
|
|
||||||
return JSONResponse(content=result, status_code=response.status_code)
|
|
||||||
except Exception as exc:
|
|
||||||
error_log.error("Proxy error | %s | /v1/messages | %s | %s: %s",
|
|
||||||
request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
|
|
||||||
raise
|
|
||||||
|
|
||||||
@app.get("/v1/models")
|
@app.get("/v1/models")
|
||||||
async def list_openai_models(db: Session = Depends(get_db)):
|
async def list_openai_models(db: Session = Depends(get_db)):
|
||||||
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
||||||
@ -396,73 +148,44 @@ async def list_openai_models(db: Session = Depends(get_db)):
|
|||||||
@app.post("/v1/chat/completions")
|
@app.post("/v1/chat/completions")
|
||||||
async def openai_chat_completions(request: Request, db: Session = Depends(get_db)):
|
async def openai_chat_completions(request: Request, db: Session = Depends(get_db)):
|
||||||
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
|
||||||
|
default_model = crud.get_setting(db, "default_model", os.getenv("DEFAULT_MODEL", "llama3"))
|
||||||
|
|
||||||
body = await request.json()
|
body = await request.json()
|
||||||
force_model = crud.get_setting(db, "force_model") or None
|
|
||||||
if force_model:
|
|
||||||
body = {**body, "model": force_model}
|
|
||||||
if not body.get("model"):
|
|
||||||
raise HTTPException(status_code=422, detail="Field 'model' is required")
|
|
||||||
messages = body.get("messages", [])
|
messages = body.get("messages", [])
|
||||||
prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages)
|
prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages)
|
||||||
|
|
||||||
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
|
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
|
||||||
raise HTTPException(status_code=429, detail="Quota exceeded")
|
raise HTTPException(status_code=429, detail="Quota exceeded")
|
||||||
|
|
||||||
model_name = body["model"]
|
if "model" not in body:
|
||||||
|
body = {**body, "model": default_model}
|
||||||
|
|
||||||
|
model_name = body["model"]
|
||||||
usage_log.info('%s | /v1/chat/completions | %s | ~%d tokens | "%s"',
|
usage_log.info('%s | /v1/chat/completions | %s | ~%d tokens | "%s"',
|
||||||
request.state.api_key_name, model_name, prompt_tokens, _last_user_msg(messages))
|
request.state.api_key_name, model_name, prompt_tokens, _last_user_msg(messages))
|
||||||
|
|
||||||
target = f"{ollama_url}/v1/chat/completions"
|
target = f"{ollama_url}/v1/chat/completions"
|
||||||
|
|
||||||
if body.get("stream"):
|
if body.get("stream"):
|
||||||
existing_opts = body.get("stream_options") or {}
|
|
||||||
stream_body = {**body, "stream_options": {**existing_opts, "include_usage": True}}
|
|
||||||
start = time.monotonic()
|
|
||||||
usage_tokens = {"prompt": 0, "completion": 0}
|
|
||||||
|
|
||||||
async def generate():
|
async def generate():
|
||||||
try:
|
try:
|
||||||
async with httpx.AsyncClient(timeout=300.0) as client:
|
async with httpx.AsyncClient(timeout=300.0) as client:
|
||||||
async with client.stream("POST", target, json=stream_body, headers=_backend_headers()) as resp:
|
async with client.stream("POST", target, json=body) as resp:
|
||||||
async for chunk in resp.aiter_bytes():
|
async for chunk in resp.aiter_bytes():
|
||||||
try:
|
|
||||||
for line in chunk.decode("utf-8", errors="ignore").splitlines():
|
|
||||||
if line.startswith("data: ") and "[DONE]" not in line:
|
|
||||||
data = json.loads(line[6:])
|
|
||||||
if u := data.get("usage"):
|
|
||||||
usage_tokens["prompt"] = u.get("prompt_tokens", 0)
|
|
||||||
usage_tokens["completion"] = u.get("completion_tokens", 0)
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
yield chunk
|
yield chunk
|
||||||
except Exception as exc:
|
except Exception as exc:
|
||||||
error_log.error("Stream error | %s | /v1/chat/completions | %s | %s: %s",
|
error_log.error("Stream error | %s | /v1/chat/completions | %s | %s: %s",
|
||||||
request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
|
request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
|
||||||
raise
|
raise
|
||||||
finally:
|
|
||||||
usage_log.info('%s | /v1/chat/completions | %s | actual ↑%d ↓%d tokens | %.1fs',
|
|
||||||
request.state.api_key_name, model_name,
|
|
||||||
usage_tokens["prompt"], usage_tokens["completion"],
|
|
||||||
time.monotonic() - start)
|
|
||||||
|
|
||||||
return StreamingResponse(
|
return StreamingResponse(
|
||||||
generate(),
|
generate(),
|
||||||
media_type="text/event-stream",
|
media_type="text/event-stream",
|
||||||
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
|
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
|
||||||
)
|
)
|
||||||
|
|
||||||
start = time.monotonic()
|
|
||||||
try:
|
try:
|
||||||
response = await proxy_request(target, method="POST", json_data=body)
|
response = await proxy_request(target, method="POST", json_data=body)
|
||||||
resp_json = response.json()
|
return JSONResponse(content=response.json(), status_code=response.status_code)
|
||||||
usage = resp_json.get("usage", {})
|
|
||||||
usage_log.info('%s | /v1/chat/completions | %s | actual ↑%d ↓%d tokens | %.1fs',
|
|
||||||
request.state.api_key_name, model_name,
|
|
||||||
usage.get("prompt_tokens", 0), usage.get("completion_tokens", 0),
|
|
||||||
time.monotonic() - start)
|
|
||||||
return JSONResponse(content=resp_json, status_code=response.status_code)
|
|
||||||
except Exception as exc:
|
except Exception as exc:
|
||||||
error_log.error("Proxy error | %s | /v1/chat/completions | %s | %s: %s",
|
error_log.error("Proxy error | %s | /v1/chat/completions | %s | %s: %s",
|
||||||
request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
|
request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
|
||||||
|
|||||||
@ -40,7 +40,7 @@ class QuotaUpdate(BaseModel):
|
|||||||
|
|
||||||
class Settings(BaseModel):
|
class Settings(BaseModel):
|
||||||
ollama_url: str
|
ollama_url: str
|
||||||
force_model: Optional[str] = None
|
default_model: str
|
||||||
|
|
||||||
class UsageStats(BaseModel):
|
class UsageStats(BaseModel):
|
||||||
tokens_used_today: int = 0
|
tokens_used_today: int = 0
|
||||||
|
|||||||
@ -1,59 +0,0 @@
|
|||||||
import os
|
|
||||||
import pytest
|
|
||||||
from fastapi.testclient import TestClient
|
|
||||||
|
|
||||||
os.environ.setdefault("ADMIN_PASSWORD", "test-admin-pw")
|
|
||||||
os.environ.setdefault("OLLAMA_URL", "http://127.0.0.1:9999")
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture
|
|
||||||
def client(tmp_path):
|
|
||||||
log_file = tmp_path / "usage.log"
|
|
||||||
log_file.write_text("\n".join(f"Zeile {i}" for i in range(1, 16)) + "\n")
|
|
||||||
(tmp_path / "error.log").write_text("Fehler A\nFehler B\n")
|
|
||||||
os.environ["LOG_FILE"] = str(log_file)
|
|
||||||
|
|
||||||
from database import Base, engine
|
|
||||||
Base.metadata.drop_all(bind=engine)
|
|
||||||
Base.metadata.create_all(bind=engine)
|
|
||||||
|
|
||||||
from admin import app
|
|
||||||
yield TestClient(app, raise_server_exceptions=False)
|
|
||||||
|
|
||||||
Base.metadata.drop_all(bind=engine)
|
|
||||||
os.environ.pop("LOG_FILE", None)
|
|
||||||
|
|
||||||
|
|
||||||
AUTH = {"Authorization": "Bearer test-admin-pw"}
|
|
||||||
|
|
||||||
|
|
||||||
def test_logs_usage_returns_last_10_lines(client):
|
|
||||||
resp = client.get("/api/logs/usage", headers=AUTH)
|
|
||||||
assert resp.status_code == 200
|
|
||||||
lines = resp.json()["lines"]
|
|
||||||
assert len(lines) == 10
|
|
||||||
assert lines[-1] == "Zeile 15"
|
|
||||||
assert lines[0] == "Zeile 6"
|
|
||||||
|
|
||||||
|
|
||||||
def test_logs_error_returns_content(client):
|
|
||||||
resp = client.get("/api/logs/error", headers=AUTH)
|
|
||||||
assert resp.status_code == 200
|
|
||||||
assert resp.json()["lines"] == ["Fehler A", "Fehler B"]
|
|
||||||
|
|
||||||
|
|
||||||
def test_logs_missing_file_returns_empty(client, tmp_path):
|
|
||||||
os.environ["LOG_FILE"] = str(tmp_path / "nonexistent.log")
|
|
||||||
resp = client.get("/api/logs/usage", headers=AUTH)
|
|
||||||
assert resp.status_code == 200
|
|
||||||
assert resp.json()["lines"] == []
|
|
||||||
|
|
||||||
|
|
||||||
def test_logs_invalid_name_returns_400(client):
|
|
||||||
resp = client.get("/api/logs/secret", headers=AUTH)
|
|
||||||
assert resp.status_code == 400
|
|
||||||
|
|
||||||
|
|
||||||
def test_logs_requires_auth(client):
|
|
||||||
resp = client.get("/api/logs/usage")
|
|
||||||
assert resp.status_code == 401
|
|
||||||
@ -1,272 +0,0 @@
|
|||||||
import json
|
|
||||||
import os
|
|
||||||
from unittest.mock import AsyncMock, MagicMock, patch, call
|
|
||||||
|
|
||||||
|
|
||||||
def _make_body(model="llama3", messages=None, stream=False, **kwargs):
|
|
||||||
body = {
|
|
||||||
"model": model,
|
|
||||||
"messages": messages or [{"role": "user", "content": "Hello"}],
|
|
||||||
"max_tokens": 100,
|
|
||||||
}
|
|
||||||
if stream:
|
|
||||||
body["stream"] = True
|
|
||||||
body.update(kwargs)
|
|
||||||
return body
|
|
||||||
|
|
||||||
|
|
||||||
def _ollama_chat_response(content="Hi!", input_tokens=5, output_tokens=3):
|
|
||||||
return {
|
|
||||||
"model": "llama3",
|
|
||||||
"message": {"role": "assistant", "content": content},
|
|
||||||
"prompt_eval_count": input_tokens,
|
|
||||||
"eval_count": output_tokens,
|
|
||||||
"done": True,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
# --- Auth ---
|
|
||||||
|
|
||||||
def test_messages_missing_auth_returns_401(test_client):
|
|
||||||
response = test_client.post("/v1/messages", json=_make_body())
|
|
||||||
assert response.status_code == 401
|
|
||||||
|
|
||||||
|
|
||||||
def test_messages_invalid_key_returns_401(test_client):
|
|
||||||
response = test_client.post(
|
|
||||||
"/v1/messages",
|
|
||||||
headers={"x-api-key": "sk-invalid"},
|
|
||||||
json=_make_body(),
|
|
||||||
)
|
|
||||||
assert response.status_code == 401
|
|
||||||
|
|
||||||
|
|
||||||
@patch("main.proxy_request", new_callable=AsyncMock)
|
|
||||||
def test_messages_accepts_anthropic_auth_token_header(mock_proxy, test_client):
|
|
||||||
mock_proxy.return_value.status_code = 200
|
|
||||||
mock_proxy.return_value.json = lambda: _ollama_chat_response()
|
|
||||||
response = test_client.post(
|
|
||||||
"/v1/messages",
|
|
||||||
headers={"anthropic-auth-token": os.environ.get("TEST_API_KEY", "")},
|
|
||||||
json=_make_body(),
|
|
||||||
)
|
|
||||||
assert response.status_code == 200
|
|
||||||
|
|
||||||
|
|
||||||
@patch("main.proxy_request", new_callable=AsyncMock)
|
|
||||||
def test_messages_accepts_x_api_key_header(mock_proxy, test_client):
|
|
||||||
mock_proxy.return_value.status_code = 200
|
|
||||||
mock_proxy.return_value.json = lambda: _ollama_chat_response()
|
|
||||||
response = test_client.post(
|
|
||||||
"/v1/messages",
|
|
||||||
headers={"x-api-key": os.environ.get("TEST_API_KEY", "")},
|
|
||||||
json=_make_body(),
|
|
||||||
)
|
|
||||||
assert response.status_code == 200
|
|
||||||
|
|
||||||
|
|
||||||
# --- Validation ---
|
|
||||||
|
|
||||||
def test_messages_missing_model_returns_422(test_client):
|
|
||||||
env = {k: v for k, v in os.environ.items() if k != "ANTHROPIC_DEFAULT_MODEL"}
|
|
||||||
with patch.dict(os.environ, env, clear=True):
|
|
||||||
response = test_client.post(
|
|
||||||
"/v1/messages",
|
|
||||||
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
|
|
||||||
json={"messages": [{"role": "user", "content": "Hi"}], "max_tokens": 100},
|
|
||||||
)
|
|
||||||
assert response.status_code == 422
|
|
||||||
|
|
||||||
|
|
||||||
@patch("main.proxy_request", new_callable=AsyncMock)
|
|
||||||
def test_messages_anthropic_default_model_used_when_no_model_in_request(mock_proxy, test_client):
|
|
||||||
mock_proxy.return_value.status_code = 200
|
|
||||||
mock_proxy.return_value.json = lambda: _ollama_chat_response()
|
|
||||||
with patch.dict(os.environ, {"ANTHROPIC_DEFAULT_MODEL": "qwen3-coder:q8_0"}):
|
|
||||||
test_client.post(
|
|
||||||
"/v1/messages",
|
|
||||||
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
|
|
||||||
json={"messages": [{"role": "user", "content": "Hi"}], "max_tokens": 100},
|
|
||||||
)
|
|
||||||
sent_body = mock_proxy.call_args[1]["json_data"]
|
|
||||||
assert sent_body["model"] == "qwen3-coder:q8_0"
|
|
||||||
|
|
||||||
|
|
||||||
# --- Quota ---
|
|
||||||
|
|
||||||
def test_messages_quota_exceeded_returns_429(test_client):
|
|
||||||
with patch("main.crud.check_and_increment_quota", return_value=False):
|
|
||||||
response = test_client.post(
|
|
||||||
"/v1/messages",
|
|
||||||
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
|
|
||||||
json=_make_body(),
|
|
||||||
)
|
|
||||||
assert response.status_code == 429
|
|
||||||
|
|
||||||
|
|
||||||
# --- Response format ---
|
|
||||||
|
|
||||||
@patch("main.proxy_request", new_callable=AsyncMock)
|
|
||||||
def test_messages_returns_anthropic_format(mock_proxy, test_client):
|
|
||||||
mock_proxy.return_value.status_code = 200
|
|
||||||
mock_proxy.return_value.json = lambda: _ollama_chat_response("Hello!")
|
|
||||||
response = test_client.post(
|
|
||||||
"/v1/messages",
|
|
||||||
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
|
|
||||||
json=_make_body(),
|
|
||||||
)
|
|
||||||
assert response.status_code == 200
|
|
||||||
data = response.json()
|
|
||||||
assert data["type"] == "message"
|
|
||||||
assert data["role"] == "assistant"
|
|
||||||
assert isinstance(data["content"], list)
|
|
||||||
assert data["content"][0]["type"] == "text"
|
|
||||||
assert data["content"][0]["text"] == "Hello!"
|
|
||||||
assert data["usage"]["input_tokens"] == 5
|
|
||||||
assert data["usage"]["output_tokens"] == 3
|
|
||||||
|
|
||||||
|
|
||||||
# --- Request transformation ---
|
|
||||||
|
|
||||||
@patch("main.proxy_request", new_callable=AsyncMock)
|
|
||||||
def test_messages_system_prompt_becomes_first_system_message(mock_proxy, test_client):
|
|
||||||
mock_proxy.return_value.status_code = 200
|
|
||||||
mock_proxy.return_value.json = lambda: _ollama_chat_response()
|
|
||||||
test_client.post(
|
|
||||||
"/v1/messages",
|
|
||||||
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
|
|
||||||
json=_make_body(system="You are helpful"),
|
|
||||||
)
|
|
||||||
sent_body = mock_proxy.call_args[1]["json_data"]
|
|
||||||
assert sent_body["messages"][0]["role"] == "system"
|
|
||||||
assert sent_body["messages"][0]["content"] == "You are helpful"
|
|
||||||
|
|
||||||
|
|
||||||
@patch("main.proxy_request", new_callable=AsyncMock)
|
|
||||||
def test_messages_tools_transformed_to_ollama_function_format(mock_proxy, test_client):
|
|
||||||
mock_proxy.return_value.status_code = 200
|
|
||||||
mock_proxy.return_value.json = lambda: _ollama_chat_response()
|
|
||||||
test_client.post(
|
|
||||||
"/v1/messages",
|
|
||||||
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
|
|
||||||
json=_make_body(tools=[{
|
|
||||||
"name": "bash",
|
|
||||||
"description": "Run bash",
|
|
||||||
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}},
|
|
||||||
}]),
|
|
||||||
)
|
|
||||||
sent_body = mock_proxy.call_args[1]["json_data"]
|
|
||||||
assert sent_body["tools"][0]["type"] == "function"
|
|
||||||
assert sent_body["tools"][0]["function"]["name"] == "bash"
|
|
||||||
assert "parameters" in sent_body["tools"][0]["function"]
|
|
||||||
|
|
||||||
|
|
||||||
@patch("main.proxy_request", new_callable=AsyncMock)
|
|
||||||
def test_messages_tool_call_response_transformed_to_anthropic(mock_proxy, test_client):
|
|
||||||
mock_proxy.return_value.status_code = 200
|
|
||||||
mock_proxy.return_value.json = lambda: {
|
|
||||||
"model": "llama3",
|
|
||||||
"message": {
|
|
||||||
"role": "assistant",
|
|
||||||
"content": "",
|
|
||||||
"tool_calls": [{"function": {"name": "bash", "arguments": {"command": "ls"}}}],
|
|
||||||
},
|
|
||||||
"prompt_eval_count": 10,
|
|
||||||
"eval_count": 5,
|
|
||||||
"done": True,
|
|
||||||
}
|
|
||||||
response = test_client.post(
|
|
||||||
"/v1/messages",
|
|
||||||
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
|
|
||||||
json=_make_body(),
|
|
||||||
)
|
|
||||||
data = response.json()
|
|
||||||
assert data["stop_reason"] == "tool_use"
|
|
||||||
tool_block = next(b for b in data["content"] if b["type"] == "tool_use")
|
|
||||||
assert tool_block["name"] == "bash"
|
|
||||||
assert tool_block["input"] == {"command": "ls"}
|
|
||||||
|
|
||||||
|
|
||||||
# --- Streaming ---
|
|
||||||
|
|
||||||
@patch("main.proxy_request", new_callable=AsyncMock)
|
|
||||||
def test_messages_streaming_returns_anthropic_sse_events(mock_proxy, test_client):
|
|
||||||
mock_proxy.return_value.status_code = 200
|
|
||||||
mock_proxy.return_value.json = lambda: {
|
|
||||||
"model": "llama3",
|
|
||||||
"message": {"role": "assistant", "content": "Hi!"},
|
|
||||||
"prompt_eval_count": 5,
|
|
||||||
"eval_count": 3,
|
|
||||||
"done": True,
|
|
||||||
}
|
|
||||||
|
|
||||||
response = test_client.post(
|
|
||||||
"/v1/messages",
|
|
||||||
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
|
|
||||||
json=_make_body(stream=True),
|
|
||||||
)
|
|
||||||
|
|
||||||
assert response.status_code == 200
|
|
||||||
events = [
|
|
||||||
json.loads(line[6:])
|
|
||||||
for line in response.text.splitlines()
|
|
||||||
if line.startswith("data: ")
|
|
||||||
]
|
|
||||||
event_types = [e["type"] for e in events]
|
|
||||||
assert "message_start" in event_types
|
|
||||||
assert "content_block_start" in event_types
|
|
||||||
assert "content_block_delta" in event_types
|
|
||||||
assert "message_stop" in event_types
|
|
||||||
|
|
||||||
deltas = [e for e in events if e["type"] == "content_block_delta"]
|
|
||||||
text = "".join(d["delta"]["text"] for d in deltas)
|
|
||||||
assert text == "Hi!"
|
|
||||||
|
|
||||||
|
|
||||||
# --- Backend-Auth (BACKEND_API_KEY) ---
|
|
||||||
|
|
||||||
def test_proxy_request_forwards_backend_api_key(test_client):
|
|
||||||
with patch("main.httpx.AsyncClient") as mock_cls:
|
|
||||||
mock_response = MagicMock()
|
|
||||||
mock_response.status_code = 200
|
|
||||||
mock_response.json.return_value = {"result": "ok"}
|
|
||||||
|
|
||||||
mock_instance = AsyncMock()
|
|
||||||
mock_instance.__aenter__ = AsyncMock(return_value=mock_instance)
|
|
||||||
mock_instance.__aexit__ = AsyncMock(return_value=False)
|
|
||||||
mock_instance.request = AsyncMock(return_value=mock_response)
|
|
||||||
mock_cls.return_value = mock_instance
|
|
||||||
|
|
||||||
with patch.dict(os.environ, {"BACKEND_API_KEY": "sk-backend-secret"}):
|
|
||||||
test_client.post(
|
|
||||||
"/api/generate",
|
|
||||||
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
|
|
||||||
json={"model": "llama3", "prompt": "hi"},
|
|
||||||
)
|
|
||||||
|
|
||||||
_, kwargs = mock_instance.request.call_args
|
|
||||||
assert kwargs.get("headers", {}).get("Authorization") == "Bearer sk-backend-secret"
|
|
||||||
|
|
||||||
|
|
||||||
def test_proxy_request_omits_auth_header_when_no_backend_key(test_client):
|
|
||||||
with patch("main.httpx.AsyncClient") as mock_cls:
|
|
||||||
mock_response = MagicMock()
|
|
||||||
mock_response.status_code = 200
|
|
||||||
mock_response.json.return_value = {"result": "ok"}
|
|
||||||
|
|
||||||
mock_instance = AsyncMock()
|
|
||||||
mock_instance.__aenter__ = AsyncMock(return_value=mock_instance)
|
|
||||||
mock_instance.__aexit__ = AsyncMock(return_value=False)
|
|
||||||
mock_instance.request = AsyncMock(return_value=mock_response)
|
|
||||||
mock_cls.return_value = mock_instance
|
|
||||||
|
|
||||||
env_without_key = {k: v for k, v in os.environ.items() if k != "BACKEND_API_KEY"}
|
|
||||||
with patch.dict(os.environ, env_without_key, clear=True):
|
|
||||||
test_client.post(
|
|
||||||
"/api/generate",
|
|
||||||
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
|
|
||||||
json={"model": "llama3", "prompt": "hi"},
|
|
||||||
)
|
|
||||||
|
|
||||||
_, kwargs = mock_instance.request.call_args
|
|
||||||
assert "Authorization" not in kwargs.get("headers", {})
|
|
||||||
@ -37,7 +37,6 @@ echo ""
|
|||||||
docker buildx build \
|
docker buildx build \
|
||||||
--platform "$PLATFORM" \
|
--platform "$PLATFORM" \
|
||||||
--push \
|
--push \
|
||||||
--build-arg APP_VERSION="$VERSION" \
|
|
||||||
-t "$IMAGE:$VERSION" \
|
-t "$IMAGE:$VERSION" \
|
||||||
-t "$IMAGE:latest" \
|
-t "$IMAGE:latest" \
|
||||||
.
|
.
|
||||||
|
|||||||
@ -76,17 +76,14 @@ const EMPTY_KEY_FORM = {
|
|||||||
name: '', expires_at: '', daily_tokens: '', monthly_tokens: '', daily_requests: '', monthly_requests: '',
|
name: '', expires_at: '', daily_tokens: '', monthly_tokens: '', daily_requests: '', monthly_requests: '',
|
||||||
};
|
};
|
||||||
|
|
||||||
function SettingsSection({ password, refreshKey }) {
|
function SettingsSection({ password }) {
|
||||||
const [settings, setSettings] = useState(null);
|
const [settings, setSettings] = useState(null);
|
||||||
const [availableModels, setAvailableModels] = useState([]);
|
const [availableModels, setAvailableModels] = useState([]);
|
||||||
const [modelsLoading, setModelsLoading] = useState(false);
|
const [modelsLoading, setModelsLoading] = useState(false);
|
||||||
const [ollamaReachable, setOllamaReachable] = useState(true);
|
const [ollamaReachable, setOllamaReachable] = useState(true);
|
||||||
const [proxyEndpoint, setProxyEndpoint] = useState(null);
|
const [proxyEndpoint, setProxyEndpoint] = useState(null);
|
||||||
const [appVersion, setAppVersion] = useState(null);
|
|
||||||
const [saved, setSaved] = useState(false);
|
const [saved, setSaved] = useState(false);
|
||||||
const [error, setError] = useState(null);
|
const [error, setError] = useState(null);
|
||||||
const [usageLog, setUsageLog] = useState([]);
|
|
||||||
const [errorLog, setErrorLog] = useState([]);
|
|
||||||
|
|
||||||
const fetchModels = async (url, currentModel) => {
|
const fetchModels = async (url, currentModel) => {
|
||||||
setModelsLoading(true);
|
setModelsLoading(true);
|
||||||
@ -98,8 +95,8 @@ function SettingsSection({ password, refreshKey }) {
|
|||||||
const { models, reachable } = res.data;
|
const { models, reachable } = res.data;
|
||||||
setOllamaReachable(reachable);
|
setOllamaReachable(reachable);
|
||||||
setAvailableModels(models);
|
setAvailableModels(models);
|
||||||
if (models.length > 0 && currentModel && !models.includes(currentModel)) {
|
if (models.length > 0 && !models.includes(currentModel)) {
|
||||||
setSettings(s => ({ ...s, force_model: models[0] }));
|
setSettings(s => ({ ...s, default_model: models[0] }));
|
||||||
}
|
}
|
||||||
} catch {
|
} catch {
|
||||||
setOllamaReachable(false);
|
setOllamaReachable(false);
|
||||||
@ -111,25 +108,16 @@ function SettingsSection({ password, refreshKey }) {
|
|||||||
|
|
||||||
useEffect(() => {
|
useEffect(() => {
|
||||||
const headers = authHeaders(password);
|
const headers = authHeaders(password);
|
||||||
Promise.allSettled([
|
Promise.all([
|
||||||
axios.get('/api/settings', { headers }),
|
axios.get('/api/settings', { headers }),
|
||||||
axios.get('/api/proxy-info', { headers }),
|
axios.get('/api/proxy-info', { headers }),
|
||||||
axios.get('/api/logs/usage', { headers }),
|
]).then(([settingsRes, proxyRes]) => {
|
||||||
axios.get('/api/logs/error', { headers }),
|
const s = settingsRes.data;
|
||||||
]).then(([settingsRes, proxyRes, usageRes, errorRes]) => {
|
|
||||||
if (settingsRes.status === 'rejected' || proxyRes.status === 'rejected') {
|
|
||||||
setError('Einstellungen konnten nicht geladen werden.');
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
const s = settingsRes.value.data;
|
|
||||||
setSettings(s);
|
setSettings(s);
|
||||||
setProxyEndpoint(proxyRes.value.data.endpoint);
|
setProxyEndpoint(proxyRes.data.endpoint);
|
||||||
setAppVersion(proxyRes.value.data.version);
|
fetchModels(s.ollama_url, s.default_model);
|
||||||
if (usageRes.status === 'fulfilled') setUsageLog(usageRes.value.data.lines);
|
}).catch(() => setError('Einstellungen konnten nicht geladen werden.'));
|
||||||
if (errorRes.status === 'fulfilled') setErrorLog(errorRes.value.data.lines);
|
}, []);
|
||||||
fetchModels(s.ollama_url, s.force_model);
|
|
||||||
});
|
|
||||||
}, [refreshKey]);
|
|
||||||
|
|
||||||
const handleSave = async (e) => {
|
const handleSave = async (e) => {
|
||||||
e.preventDefault();
|
e.preventDefault();
|
||||||
@ -157,10 +145,6 @@ function SettingsSection({ password, refreshKey }) {
|
|||||||
<small> (Änderung erfordert Neustart)</small>
|
<small> (Änderung erfordert Neustart)</small>
|
||||||
</span>
|
</span>
|
||||||
</div>
|
</div>
|
||||||
<div className="settings-row">
|
|
||||||
<label>Version</label>
|
|
||||||
<span className="settings-value">{appVersion ?? '…'}</span>
|
|
||||||
</div>
|
|
||||||
<div className="settings-row">
|
<div className="settings-row">
|
||||||
<label>Ollama-Endpunkt</label>
|
<label>Ollama-Endpunkt</label>
|
||||||
<div className="settings-input-wrap">
|
<div className="settings-input-wrap">
|
||||||
@ -168,7 +152,7 @@ function SettingsSection({ password, refreshKey }) {
|
|||||||
type="url"
|
type="url"
|
||||||
value={settings.ollama_url}
|
value={settings.ollama_url}
|
||||||
onChange={(e) => setSettings({ ...settings, ollama_url: e.target.value })}
|
onChange={(e) => setSettings({ ...settings, ollama_url: e.target.value })}
|
||||||
onBlur={(e) => fetchModels(e.target.value, settings.force_model)}
|
onBlur={(e) => fetchModels(e.target.value, settings.default_model)}
|
||||||
placeholder="http://localhost:11434"
|
placeholder="http://localhost:11434"
|
||||||
required
|
required
|
||||||
/>
|
/>
|
||||||
@ -178,40 +162,30 @@ function SettingsSection({ password, refreshKey }) {
|
|||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<div className="settings-row">
|
<div className="settings-row">
|
||||||
<label>Aktives Modell (Lock)</label>
|
<label>Standard-Modell</label>
|
||||||
{modelsLoading ? (
|
{modelsLoading ? (
|
||||||
<span className="settings-value">Lade Modelle…</span>
|
<span className="settings-value">Lade Modelle…</span>
|
||||||
) : availableModels.length > 0 ? (
|
) : availableModels.length > 0 ? (
|
||||||
<select
|
<select
|
||||||
value={settings.force_model || ""}
|
value={settings.default_model}
|
||||||
onChange={(e) => setSettings({ ...settings, force_model: e.target.value || null })}
|
onChange={(e) => setSettings({ ...settings, default_model: e.target.value })}
|
||||||
>
|
>
|
||||||
<option value="">— kein Lock —</option>
|
|
||||||
{availableModels.map(m => <option key={m} value={m}>{m}</option>)}
|
{availableModels.map(m => <option key={m} value={m}>{m}</option>)}
|
||||||
</select>
|
</select>
|
||||||
) : (
|
) : (
|
||||||
<input
|
<input
|
||||||
type="text"
|
type="text"
|
||||||
value={settings.force_model || ""}
|
value={settings.default_model}
|
||||||
onChange={(e) => setSettings({ ...settings, force_model: e.target.value || null })}
|
onChange={(e) => setSettings({ ...settings, default_model: e.target.value })}
|
||||||
placeholder="leer = kein Lock"
|
placeholder="llama3"
|
||||||
|
required
|
||||||
/>
|
/>
|
||||||
)}
|
)}
|
||||||
</div>
|
</div>
|
||||||
{error && <div className="error">{error}</div>}
|
{error && <div className="error">{error}</div>}
|
||||||
{saved && <div className="success">Gespeichert.</div>}
|
{saved && <div className="success">Gespeichert.</div>}
|
||||||
<button type="submit">Änderungen übernehmen</button>
|
<button type="submit">Speichern</button>
|
||||||
</form>
|
</form>
|
||||||
<div className="log-section">
|
|
||||||
<h3>Nutzungslog (letzte 10 Einträge)</h3>
|
|
||||||
<pre className="log-pre">{usageLog.length > 0 ? usageLog.join('\n') : '— keine Einträge —'}</pre>
|
|
||||||
{errorLog.length > 0 && (
|
|
||||||
<>
|
|
||||||
<h3>Fehlerlog (letzte 10 Einträge)</h3>
|
|
||||||
<pre className="log-pre log-pre-error">{errorLog.join('\n')}</pre>
|
|
||||||
</>
|
|
||||||
)}
|
|
||||||
</div>
|
|
||||||
</section>
|
</section>
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
@ -226,31 +200,21 @@ function App() {
|
|||||||
const [creating, setCreating] = useState(false);
|
const [creating, setCreating] = useState(false);
|
||||||
const [editKey, setEditKey] = useState(null);
|
const [editKey, setEditKey] = useState(null);
|
||||||
const [editForm, setEditForm] = useState({});
|
const [editForm, setEditForm] = useState({});
|
||||||
const [refreshKey, setRefreshKey] = useState(0);
|
|
||||||
const [lastUpdated, setLastUpdated] = useState(null);
|
useEffect(() => {
|
||||||
|
if (!password) { setLoading(false); return; }
|
||||||
|
fetchApiKeys().finally(() => setLoading(false));
|
||||||
|
}, [password]);
|
||||||
|
|
||||||
const fetchApiKeys = async () => {
|
const fetchApiKeys = async () => {
|
||||||
try {
|
try {
|
||||||
const res = await axios.get('/api/api-keys', { headers: authHeaders(password) });
|
const res = await axios.get('/api/api-keys', { headers: authHeaders(password) });
|
||||||
setApiKeys(res.data);
|
setApiKeys(res.data);
|
||||||
setLastUpdated(new Date());
|
|
||||||
} catch {
|
} catch {
|
||||||
setError('API-Keys konnten nicht geladen werden.');
|
setError('API-Keys konnten nicht geladen werden.');
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
useEffect(() => {
|
|
||||||
if (!password) { setLoading(false); return; }
|
|
||||||
fetchApiKeys().finally(() => setLoading(false));
|
|
||||||
|
|
||||||
const timer = setInterval(() => {
|
|
||||||
fetchApiKeys();
|
|
||||||
setRefreshKey(k => k + 1);
|
|
||||||
}, 5 * 60 * 1000);
|
|
||||||
|
|
||||||
return () => clearInterval(timer);
|
|
||||||
}, [password]);
|
|
||||||
|
|
||||||
const handleCreate = async (e) => {
|
const handleCreate = async (e) => {
|
||||||
e.preventDefault();
|
e.preventDefault();
|
||||||
setCreating(true);
|
setCreating(true);
|
||||||
@ -331,7 +295,6 @@ function App() {
|
|||||||
|
|
||||||
const logout = () => {
|
const logout = () => {
|
||||||
sessionStorage.removeItem('admin_password');
|
sessionStorage.removeItem('admin_password');
|
||||||
setLastUpdated(null);
|
|
||||||
setPassword(null);
|
setPassword(null);
|
||||||
};
|
};
|
||||||
|
|
||||||
@ -343,17 +306,10 @@ function App() {
|
|||||||
<div className="container">
|
<div className="container">
|
||||||
<div className="header">
|
<div className="header">
|
||||||
<h1>Ollama Proxy Admin</h1>
|
<h1>Ollama Proxy Admin</h1>
|
||||||
<div className="header-right">
|
<button onClick={logout}>Abmelden</button>
|
||||||
{lastUpdated && (
|
|
||||||
<span className="last-updated">
|
|
||||||
Aktualisiert: {lastUpdated.toLocaleTimeString('de-DE', { hour: '2-digit', minute: '2-digit' })}
|
|
||||||
</span>
|
|
||||||
)}
|
|
||||||
<button onClick={logout}>Abmelden</button>
|
|
||||||
</div>
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<SettingsSection password={password} refreshKey={refreshKey} />
|
<SettingsSection password={password} />
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<h2>Neuer API-Key</h2>
|
<h2>Neuer API-Key</h2>
|
||||||
|
|||||||
@ -182,7 +182,6 @@ tr:hover {
|
|||||||
.settings-row label {
|
.settings-row label {
|
||||||
width: 160px;
|
width: 160px;
|
||||||
flex-shrink: 0;
|
flex-shrink: 0;
|
||||||
font-size: 14px;
|
|
||||||
font-weight: 500;
|
font-weight: 500;
|
||||||
color: #2c3e50;
|
color: #2c3e50;
|
||||||
}
|
}
|
||||||
@ -409,7 +408,7 @@ tr:hover {
|
|||||||
.edit-form label small {
|
.edit-form label small {
|
||||||
font-weight: 400;
|
font-weight: 400;
|
||||||
color: #999;
|
color: #999;
|
||||||
font-size: 12px;
|
font-size: 11px;
|
||||||
}
|
}
|
||||||
|
|
||||||
.edit-form input {
|
.edit-form input {
|
||||||
@ -453,46 +452,3 @@ tr:hover {
|
|||||||
.btn-cancel:hover {
|
.btn-cancel:hover {
|
||||||
background: #7f8c8d;
|
background: #7f8c8d;
|
||||||
}
|
}
|
||||||
|
|
||||||
.log-section {
|
|
||||||
margin-top: 24px;
|
|
||||||
border-top: 1px solid #eee;
|
|
||||||
padding-top: 16px;
|
|
||||||
}
|
|
||||||
|
|
||||||
.log-section h3 {
|
|
||||||
font-size: 14px;
|
|
||||||
font-weight: 600;
|
|
||||||
color: #34495e;
|
|
||||||
margin: 0 0 6px;
|
|
||||||
}
|
|
||||||
|
|
||||||
.log-pre {
|
|
||||||
background: #1e2a35;
|
|
||||||
color: #c8d6df;
|
|
||||||
font-family: 'Menlo', 'Consolas', monospace;
|
|
||||||
font-size: 11px;
|
|
||||||
line-height: 1.6;
|
|
||||||
padding: 10px 14px;
|
|
||||||
border-radius: 4px;
|
|
||||||
margin: 0 0 14px;
|
|
||||||
overflow-x: auto;
|
|
||||||
white-space: pre;
|
|
||||||
}
|
|
||||||
|
|
||||||
.log-pre-error {
|
|
||||||
background: #2d1b1b;
|
|
||||||
color: #f5a0a0;
|
|
||||||
margin-bottom: 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
.header-right {
|
|
||||||
display: flex;
|
|
||||||
align-items: center;
|
|
||||||
gap: 16px;
|
|
||||||
}
|
|
||||||
|
|
||||||
.last-updated {
|
|
||||||
font-size: 12px;
|
|
||||||
color: #95a5a6;
|
|
||||||
}
|
|
||||||
|
|||||||
@ -11,7 +11,6 @@ export default defineConfig({
|
|||||||
'/api/settings': 'http://localhost:8001',
|
'/api/settings': 'http://localhost:8001',
|
||||||
'/api/ollama-models': 'http://localhost:8001',
|
'/api/ollama-models': 'http://localhost:8001',
|
||||||
'/api/proxy-info': 'http://localhost:8001',
|
'/api/proxy-info': 'http://localhost:8001',
|
||||||
'/api/logs': 'http://localhost:8001',
|
|
||||||
'/api': 'http://localhost:8000',
|
'/api': 'http://localhost:8000',
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
|
|||||||
8
run_tests.py
Normal file
8
run_tests.py
Normal file
@ -0,0 +1,8 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Pytest runner for Ollama Proxy tests."""
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
result = subprocess.run([sys.executable, "-m", "pytest"] + sys.argv[1:], cwd="backend")
|
||||||
|
sys.exit(result.returncode)
|
||||||
14
start.sh
14
start.sh
@ -1,19 +1,17 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
|
||||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
|
||||||
|
|
||||||
# .env laden
|
# .env laden
|
||||||
if [ -f "$SCRIPT_DIR/.env" ]; then
|
if [ -f .env ]; then
|
||||||
set -a
|
set -a
|
||||||
source "$SCRIPT_DIR/.env"
|
source .env
|
||||||
set +a
|
set +a
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Virtuelle Umgebung aktivieren falls vorhanden
|
# Virtuelle Umgebung aktivieren falls vorhanden
|
||||||
if [ -f "$SCRIPT_DIR/.venv/bin/activate" ]; then
|
if [ -f .venv/bin/activate ]; then
|
||||||
source "$SCRIPT_DIR/.venv/bin/activate"
|
source .venv/bin/activate
|
||||||
elif [ -f "$SCRIPT_DIR/venv/bin/activate" ]; then
|
elif [ -f venv/bin/activate ]; then
|
||||||
source "$SCRIPT_DIR/venv/bin/activate"
|
source venv/bin/activate
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if [ -z "$ADMIN_PASSWORD" ]; then
|
if [ -z "$ADMIN_PASSWORD" ]; then
|
||||||
|
|||||||
@ -1,33 +0,0 @@
|
|||||||
#!/bin/bash
|
|
||||||
|
|
||||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
|
||||||
|
|
||||||
# .env laden
|
|
||||||
if [ -f "$SCRIPT_DIR/.env" ]; then
|
|
||||||
set -a
|
|
||||||
source "$SCRIPT_DIR/.env"
|
|
||||||
set +a
|
|
||||||
fi
|
|
||||||
|
|
||||||
# API-Key: erstes Argument hat Vorrang, sonst Umgebungsvariable PROXY_API_KEY
|
|
||||||
API_KEY="${1:-$PROXY_API_KEY}"
|
|
||||||
|
|
||||||
if [ -z "$API_KEY" ]; then
|
|
||||||
echo "Fehler: Kein API-Key angegeben."
|
|
||||||
echo "Verwendung: ./start_claude.sh sk-dein-key"
|
|
||||||
echo " oder: PROXY_API_KEY=sk-dein-key ./start_claude.sh"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
# 0.0.0.0 ist eine Bind-Adresse, kein gültiger Client-Host
|
|
||||||
PROXY_HOST="${PROXY_HOST:-0.0.0.0}"
|
|
||||||
PROXY_PORT="${PROXY_PORT:-8000}"
|
|
||||||
if [ "$PROXY_HOST" = "0.0.0.0" ]; then
|
|
||||||
PROXY_HOST="localhost"
|
|
||||||
fi
|
|
||||||
|
|
||||||
export ANTHROPIC_BASE_URL="http://${PROXY_HOST}:${PROXY_PORT}"
|
|
||||||
export ANTHROPIC_AUTH_TOKEN="$API_KEY"
|
|
||||||
|
|
||||||
echo "Verbinde mit Proxy: $ANTHROPIC_BASE_URL"
|
|
||||||
exec claude
|
|
||||||
7
test_api.sh
Normal file
7
test_api.sh
Normal file
@ -0,0 +1,7 @@
|
|||||||
|
curl -X POST http://localhost:8000/api/generate \
|
||||||
|
-H "Authorization: sk-admin-key" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"model": "llama3",
|
||||||
|
"prompt": "Test"
|
||||||
|
}'
|
||||||
Loading…
x
Reference in New Issue
Block a user