Compare commits

...

41 Commits
0.9.4 ... main

Author SHA1 Message Date
Oliver Hofmann
92ed7368eb Remove redundant run_tests.py wrapper 2026-05-10 11:51:13 +02:00
Oliver Hofmann
5a50d0be04 Remove obsolete test_api.sh 2026-05-10 11:47:15 +02:00
Oliver Hofmann
21cab46365 Remove docs/ from tracking, add to gitignore 2026-05-10 11:44:10 +02:00
Oliver Hofmann
fcaea9e3a9 Add license section to DockerHub descriptions 2026-05-10 11:21:39 +02:00
Oliver Hofmann
eb83c52b7f Rename settings save button to 'Änderungen übernehmen' 2026-05-10 10:50:09 +02:00
Oliver Hofmann
f551b2a421 Harmonize typography: remove log uppercase, normalize label font sizes 2026-05-10 10:45:41 +02:00
Oliver Hofmann
79a30dd179 Route /api/logs to admin API in Vite proxy config 2026-05-10 10:40:41 +02:00
Oliver Hofmann
0353e0299f Use Promise.allSettled so log fetch failures don't break settings display 2026-05-10 10:34:28 +02:00
Oliver Hofmann
5a94fc6d90 Reset lastUpdated on logout 2026-05-10 10:22:33 +02:00
Oliver Hofmann
cf1b3f7786 Add 5-minute auto-reload and last-updated timestamp to admin UI 2026-05-10 10:20:03 +02:00
Oliver Hofmann
02b4ad06ca Fix pre whitespace, log-pre-error margin, error log heading 2026-05-10 10:18:27 +02:00
Oliver Hofmann
ca55783b90 Show last 10 log lines in settings section 2026-05-10 10:15:48 +02:00
Oliver Hofmann
a9b0168c71 Add GET /api/logs/{name} endpoint to admin API 2026-05-10 10:11:55 +02:00
Oliver Hofmann
7ce4d3a895 Add implementation plan: log viewer and auto-reload 2026-05-10 10:09:34 +02:00
Oliver Hofmann
fff9d1048d Add design spec: log viewer and auto-reload for admin UI 2026-05-10 10:07:44 +02:00
Oliver Hofmann
cdd55880d6 Remove unnecessary bold formatting from Anthropic API feature entries 2026-05-10 10:00:34 +02:00
Oliver Hofmann
6b2ae4b072 Remove BACKEND_API_KEY from public documentation 2026-05-10 09:58:50 +02:00
Oliver Hofmann
4c8a8d4afb Update Kurzanleitung: add gemma4:e4b, Claude Code section 2026-05-10 09:56:35 +02:00
Oliver Hofmann
9872175fb0 Add LICENSE, update docs with Anthropic endpoint and free-claude-code attribution 2026-05-10 09:53:12 +02:00
Oliver Hofmann
cc3ee5a03c Add Anthropic Messages API compatibility layer (/v1/messages)
- POST /v1/messages endpoint with full quota enforcement and auth
- Accepts x-api-key and anthropic-auth-token headers (for Claude Code)
- Transforms Anthropic request/response format ↔ Ollama /api/chat
- Streaming support via Anthropic SSE format
- Tool use support (request and response transformation)
- ANTHROPIC_DEFAULT_MODEL env var for model selection without admin UI
- BACKEND_API_KEY env var for forwarding auth to upstream proxies
- Fix SQLite path always resolved relative to database.py location
- start.sh and start_claude.sh load .env relative to script location
2026-05-10 09:45:38 +02:00
Oliver Hofmann
70fd61608b Log actual tokens and elapsed time for all endpoints incl. streaming
For streaming /v1/chat/completions: inject stream_options.include_usage,
parse usage from SSE chunks, log actual ↑↓ tokens and wall time in the
generator's finally block. Add elapsed time to all second log entries.
2026-05-08 09:47:32 +02:00
Oliver Hofmann
07f6fec4bf Show app version in admin UI and /version endpoint
Embed APP_VERSION build arg in Docker image (default: dev).
build_push.sh passes the git tag as build arg. Proxy exposes
GET /version, admin UI shows it as read-only field in settings.
2026-05-08 09:30:23 +02:00
Oliver Hofmann
6761a73364 Add /api/ps example to Kurzanleitung 2026-05-08 09:25:44 +02:00
Oliver Hofmann
0d1ce96c99 Expose /api/ps to show currently loaded model 2026-05-08 09:22:17 +02:00
Oliver Hofmann
b16b3af44d Mention VPN as alias for Intranet in Kurzanleitung 2026-05-08 09:17:08 +02:00
Oliver Hofmann
fdbe0a74e8 Exclude .tex and .pdf from Docker build context 2026-05-08 08:41:49 +02:00
Oliver Hofmann
0154c89c6b Improve Python examples and opencode description in Kurzanleitung
Split second example into focused model listing only, remove repeated
prompt code. Replace TUI with 'interaktive Terminal-Oberfläche'.
2026-05-08 08:40:25 +02:00
Oliver Hofmann
e3dbed9f5e Ignore generated KURZANLEITUNG.tex and .pdf 2026-05-08 08:15:07 +02:00
Oliver Hofmann
5b37718120 Fix list indentation in KURZANLEITUNG.md 2026-05-08 08:07:02 +02:00
Oliver Hofmann
f823e7d314 Return 422 when model field is missing and no force_model is set
Prevents an opaque Ollama error from reaching the client by failing fast
with a clear message before the request is forwarded.
2026-05-08 08:04:52 +02:00
Oliver Hofmann
34b108f4df Replace default_model with force_model (model lock)
Removes DEFAULT_MODEL in favour of a force_model setting configurable
via the admin UI. When set, every proxy request's model field is
overridden, preventing uncoordinated model switches during lab sessions.
Updates schemas, admin API, all three proxy endpoints, frontend,
init_db, and docs (README, DOCKERHUB, KURZANLEITUNG).
2026-05-08 08:02:16 +02:00
Oliver Hofmann
cced65693c Log actual Ollama token counts and add user guide
Add a second usage log line after each proxy response with actual ↑prompt ↓completion
token counts from Ollama (prompt_eval_count/eval_count for native endpoints,
usage object for OpenAI endpoint). Also adds KURZANLEITUNG.md for students and
colleagues covering API access, model selection, Python examples, opencode setup,
and quota/admin information.
2026-05-08 07:21:36 +02:00
Oliver Hofmann
256bafe30d Explain network_mode: host motivation in README 2026-05-07 16:14:44 +02:00
Oliver Hofmann
555d9899fe Fix build_push.sh tag detection 2026-05-07 16:11:50 +02:00
Oliver Hofmann
31504d1a5b Remove DOCKERHUB.md reference from README 2026-05-07 16:09:22 +02:00
Oliver Hofmann
2e7b13227d Make ADMIN_HOST consistent across dev and prod 2026-05-07 16:08:05 +02:00
Oliver Hofmann
5469981eb5 Add ADMIN_HOST to env tables and .env example 2026-05-07 16:04:23 +02:00
Oliver Hofmann
a1e293b1d7 Add ADMIN_HOST env var, restructure docs
- docker-entrypoint.sh: Admin-API bindet auf ADMIN_HOST (default 0.0.0.0)
  statt hardcoded 0.0.0.0 — ermöglicht Einschränkung auf 127.0.0.1
- README: Zweck-Beschreibung, HTTPS-Reverse-Proxy-Abschnitt (Caddy/Nginx),
  Port-8001-Abschnitt korrigiert (Docker-Port-Mapping greift bei
  network_mode: host nicht), ADMIN_HOST in Konfig-Tabelle ergänzt
- DOCKERHUB.md / DOCKERHUB.en.md: Auf drei Szenarien reduziert
  (network_mode: host, Ollama als Container + SQLite/PostgreSQL);
  host.docker.internal-Varianten entfernt
- review_priorities.md: gelöscht (alle Punkte behoben)
2026-05-07 16:03:03 +02:00
9f92c09586 Add container_name, remove ineffective extra_hosts with network_mode host 2026-05-07 15:09:04 +02:00
3974010156 Use network_mode: host so container can reach Ollama on 127.0.0.1 2026-05-07 13:58:26 +02:00
8d3f9a7661 Fix OpenAI array content, add error logging, Ollama reachability warning
- Normalize OpenAI array-format content to string to fix connection reset
- Add error.log with rotating handler for proxy and stream errors
- Add global unhandled exception handler returning JSON 500
- Write OLLAMA_URL/DEFAULT_MODEL env vars to DB on startup (reset on restart)
- Add extra_hosts to docker-compose.yml for host.docker.internal on Linux
- Show warning in admin UI when Ollama URL is unreachable
- Return reachable: true/false from /api/ollama-models endpoint
2026-05-07 11:43:17 +02:00
27 changed files with 1381 additions and 302 deletions

View File

@ -42,6 +42,8 @@ docker-compose.yml
# Docs # Docs
*.md *.md
*.tex
*.pdf
# Dev & build scripts # Dev & build scripts
run_dev.py run_dev.py

View File

@ -16,3 +16,8 @@ DATABASE_URL=sqlite:///./test.db
OLLAMA_URL=http://localhost:11434 OLLAMA_URL=http://localhost:11434
DEFAULT_MODEL=llama3 DEFAULT_MODEL=llama3
APP_TZ=Europe/Berlin APP_TZ=Europe/Berlin
# Standard-Modell für den Anthropic-kompatiblen Endpunkt (/v1/messages)
# Wird verwendet, wenn der Client kein Modell angibt oder ein Anthropic-Modellname
# (z.B. claude-opus-4-7) auf kein lokales Modell passt.
ANTHROPIC_DEFAULT_MODEL=llama3

7
.gitignore vendored
View File

@ -27,3 +27,10 @@ frontend/dist/
# Misc # Misc
config.json config.json
# Generated documents
KURZANLEITUNG.tex
KURZANLEITUNG.pdf
# Internal planning docs
docs/

View File

@ -1,14 +1,14 @@
# mediaeng/llmproxy # mediaeng/llmproxy
A lightweight reverse proxy for [Ollama](https://ollama.com) that manages API keys with configurable token and request quotas. Incoming requests in OpenAI-compatible format are authenticated, checked against the quota, and forwarded to the configured Ollama server. A lightweight reverse proxy for [Ollama](https://ollama.com) that manages API keys with configurable token and request quotas. Incoming requests in OpenAI-compatible or Anthropic-compatible format are authenticated, checked against the quota, and forwarded to the configured Ollama server.
Ollama does not need to run on the same host — `OLLAMA_URL` can point to any reachable server: the Docker host itself, another machine on the network, or a remote server.
## Features ## Features
- OpenAI-compatible endpoint (`/v1/chat/completions`, `/v1/models`) - OpenAI-compatible endpoint (`/v1/chat/completions`, `/v1/models`)
- Anthropic Messages API (`/v1/messages`) — compatible with Claude Code CLI and Anthropic SDK clients
- API key management with daily and monthly token/request limits - API key management with daily and monthly token/request limits
- Web-based admin interface (port 8001) - Web-based admin interface (port 8001)
- Model lock: enforces a specific model for all requests (useful for courses and lab sessions)
- Streaming support (Server-Sent Events) - Streaming support (Server-Sent Events)
- Tool use / function calling passthrough - Tool use / function calling passthrough
- Rotating usage logs - Rotating usage logs
@ -18,19 +18,10 @@ Ollama does not need to run on the same host — `OLLAMA_URL` can point to any r
| Port | Service | | Port | Service |
|------|---------| |------|---------|
| `8000` | Proxy endpoint (OpenAI API) | | `8000` | Proxy endpoint (OpenAI and Anthropic API) |
| `8001` | Admin API + web interface | | `8001` | Admin API + web interface |
Port 8001 must be exposed because the container serves the admin interface directly on this port. All API endpoints require the `ADMIN_PASSWORD` — without a valid token, only the public frontend files (HTML/JS/CSS of the login page) are accessible. The password is therefore the primary protection. All API endpoints require the `ADMIN_PASSWORD` — without a valid token, only the public frontend files (HTML/JS/CSS of the login page) are accessible. The password is therefore the primary protection.
Additional hardening: binding to `127.0.0.1` restricts access to the local host and prevents direct network access:
```
ports:
- "127.0.0.1:8001:8001" # local access only
# or:
- "8001:8001" # network-wide, protected by ADMIN_PASSWORD only
```
## Environment Variables ## Environment Variables
@ -38,88 +29,45 @@ ports:
|----------|---------|-------------| |----------|---------|-------------|
| `ADMIN_PASSWORD` | | **Required.** Password for the admin interface | | `ADMIN_PASSWORD` | | **Required.** Password for the admin interface |
| `OLLAMA_URL` | `http://localhost:11434` | URL of the Ollama server (without `/v1` suffix) | | `OLLAMA_URL` | `http://localhost:11434` | URL of the Ollama server (without `/v1` suffix) |
| `DEFAULT_MODEL` | `llama3` | Model used when the client does not specify one |
| `DATABASE_URL` | `sqlite:///./test.db` | Database connection string (SQLite or PostgreSQL) | | `DATABASE_URL` | `sqlite:///./test.db` | Database connection string (SQLite or PostgreSQL) |
| `PROXY_HOST` | `0.0.0.0` | Proxy bind address | | `PROXY_HOST` | `0.0.0.0` | Proxy bind address |
| `PROXY_PORT` | `8000` | Proxy port | | `PROXY_PORT` | `8000` | Proxy port |
| `ADMIN_HOST` | `0.0.0.0` | Admin API bind address (`127.0.0.1` to restrict to local access) |
| `ADMIN_PORT` | `8001` | Admin API port | | `ADMIN_PORT` | `8001` | Admin API port |
| `APP_TZ` | `Europe/Berlin` | Timezone for daily/monthly quota resets | | `APP_TZ` | `Europe/Berlin` | Timezone for daily/monthly quota resets |
| `LOG_FILE` | `logs/usage.log` | Path of the rotating usage log file | | `LOG_FILE` | `logs/usage.log` | Path of the rotating usage log file |
| `ANTHROPIC_DEFAULT_MODEL` | | Default model for `/v1/messages` (Ollama model name, e.g. `llama3`) |
## Docker Compose External Ollama, SQLite ## Docker Compose Ollama on the Host (Linux, recommended)
Use this when Ollama runs outside of Docker — on the Docker host or any other reachable server. Adjust `OLLAMA_URL` accordingly. `network_mode: host` gives the container direct access to the host network stack. Ollama runs on the host and is reachable at `localhost:11434` — not visible from outside. The proxy and admin interface are available directly on host ports 8000 and 8001.
```yaml ```yaml
services: services:
llmproxy: llmproxy:
image: mediaeng/llmproxy:latest image: mediaeng/llmproxy:latest
container_name: llmproxy
restart: unless-stopped restart: unless-stopped
ports: network_mode: host
- "8000:8000" env_file: .env
- "127.0.0.1:8001:8001"
environment:
ADMIN_PASSWORD: changeme
OLLAMA_URL: http://host.docker.internal:11434 # or http://<ip>:11434
DEFAULT_MODEL: llama3
APP_TZ: Europe/Berlin
volumes: volumes:
- llmproxy-data:/app/backend - llmproxy-data:/app/backend
# On Linux, add extra_hosts since host.docker.internal is not
# available automatically:
# extra_hosts:
# - "host.docker.internal:host-gateway"
volumes: volumes:
llmproxy-data: llmproxy-data:
``` ```
## Docker Compose External Ollama, PostgreSQL `.env`:
```env
```yaml ADMIN_PASSWORD=changeme
services: OLLAMA_URL=http://localhost:11434
llmproxy: APP_TZ=Europe/Berlin
image: mediaeng/llmproxy:latest ANTHROPIC_DEFAULT_MODEL=llama3
restart: unless-stopped
ports:
- "8000:8000"
- "127.0.0.1:8001:8001"
environment:
ADMIN_PASSWORD: changeme
OLLAMA_URL: http://host.docker.internal:11434 # or http://<ip>:11434
DEFAULT_MODEL: llama3
APP_TZ: Europe/Berlin
DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
volumes:
- llmproxy-data:/app/backend
depends_on:
db:
condition: service_healthy
# extra_hosts:
# - "host.docker.internal:host-gateway"
db:
image: postgres:16-alpine
restart: unless-stopped
environment:
POSTGRES_DB: llmproxy
POSTGRES_USER: llmproxy
POSTGRES_PASSWORD: secret
volumes:
- pg-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U llmproxy"]
interval: 5s
timeout: 5s
retries: 5
volumes:
pg-data:
``` ```
## Docker Compose Ollama as Container, SQLite ## Docker Compose Ollama as Container, SQLite
Ollama and llmproxy run together in Docker, data persisted in a volume. Ollama and llmproxy run together in Docker. Ollama is not exposed externally.
```yaml ```yaml
services: services:
@ -128,12 +76,12 @@ services:
restart: unless-stopped restart: unless-stopped
ports: ports:
- "8000:8000" - "8000:8000"
- "127.0.0.1:8001:8001" - "8001:8001"
environment: environment:
ADMIN_PASSWORD: changeme ADMIN_PASSWORD: changeme
OLLAMA_URL: http://ollama:11434 OLLAMA_URL: http://ollama:11434
DEFAULT_MODEL: llama3
APP_TZ: Europe/Berlin APP_TZ: Europe/Berlin
ANTHROPIC_DEFAULT_MODEL: llama3
volumes: volumes:
- llmproxy-data:/app/backend - llmproxy-data:/app/backend
depends_on: depends_on:
@ -161,13 +109,13 @@ services:
restart: unless-stopped restart: unless-stopped
ports: ports:
- "8000:8000" - "8000:8000"
- "127.0.0.1:8001:8001" - "8001:8001"
environment: environment:
ADMIN_PASSWORD: changeme ADMIN_PASSWORD: changeme
OLLAMA_URL: http://ollama:11434 OLLAMA_URL: http://ollama:11434
DEFAULT_MODEL: llama3
APP_TZ: Europe/Berlin APP_TZ: Europe/Berlin
DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
ANTHROPIC_DEFAULT_MODEL: llama3
depends_on: depends_on:
db: db:
condition: service_healthy condition: service_healthy
@ -200,22 +148,25 @@ volumes:
ollama-data: ollama-data:
``` ```
## Quick Start
```bash
docker run -d \
-p 8000:8000 \
-e ADMIN_PASSWORD=changeme \
-e OLLAMA_URL=http://host.docker.internal:11434 \
-v llmproxy-data:/app/backend \
mediaeng/llmproxy:latest
```
## Client Configuration ## Client Configuration
Configure the proxy as an OpenAI-compatible endpoint: **OpenAI-compatible client:**
``` ```
Base URL: http://<host>:8000/v1 Base URL: http://<host>:8000/v1
API Key: <API key created in the admin interface> API Key: <API key created in the admin interface>
``` ```
**Claude Code CLI:**
```bash
ANTHROPIC_BASE_URL=http://<host>:8000 \
ANTHROPIC_AUTH_TOKEN=<API key created in the admin interface> \
claude
```
## Acknowledgements
The Anthropic Messages API endpoint (`/v1/messages`) was inspired by [free-claude-code](https://github.com/Alishahryar1/free-claude-code) by Ali Khokhar, which pursues a similar approach for routing Claude Code requests to alternative LLM backends.
## License
MIT — © 2026 Oliver Hofmann. See [LICENSE](https://git.efi.th-nuernberg.de/gitea/hofmannol/llmproxy/src/branch/main/LICENSE) for details.

View File

@ -1,14 +1,14 @@
# mediaeng/llmproxy # mediaeng/llmproxy
Ein schlanker Reverse-Proxy für [Ollama](https://ollama.com), der API-Keys mit konfigurierbaren Token- und Request-Quoten verwaltet. Eingehende Anfragen im OpenAI-kompatiblen Format werden authentifiziert, auf Quota geprüft und an den konfigurierten Ollama-Server weitergeleitet. Ein schlanker Reverse-Proxy für [Ollama](https://ollama.com), der API-Keys mit konfigurierbaren Token- und Request-Quoten verwaltet. Eingehende Anfragen im OpenAI-kompatiblen oder Anthropic-kompatiblen Format werden authentifiziert, auf Quota geprüft und an den konfigurierten Ollama-Server weitergeleitet.
Ollama muss dabei nicht auf demselben Host laufen — `OLLAMA_URL` kann auf jeden erreichbaren Server zeigen, also auf den Docker-Host selbst, einen anderen Rechner im Netzwerk oder einen Remote-Server.
## Funktionen ## Funktionen
- OpenAI-kompatibler Endpunkt (`/v1/chat/completions`, `/v1/models`) - OpenAI-kompatibler Endpunkt (`/v1/chat/completions`, `/v1/models`)
- Anthropic Messages API (`/v1/messages`) — kompatibel mit Claude Code CLI und Anthropic-SDK-Clients
- API-Key-Verwaltung mit tages- und monatlichen Token-/Request-Limits - API-Key-Verwaltung mit tages- und monatlichen Token-/Request-Limits
- Web-basierte Admin-Oberfläche (Port 8001) - Web-basierte Admin-Oberfläche (Port 8001)
- Modell-Lock: erzwingt ein bestimmtes Modell für alle Requests (nützlich für Praktika/Kurse)
- Streaming-Support (Server-Sent Events) - Streaming-Support (Server-Sent Events)
- Tool-Use / Function Calling wird durchgereicht - Tool-Use / Function Calling wird durchgereicht
- Rotierende Nutzungs-Logs - Rotierende Nutzungs-Logs
@ -18,19 +18,10 @@ Ollama muss dabei nicht auf demselben Host laufen — `OLLAMA_URL` kann auf jede
| Port | Dienst | | Port | Dienst |
|------|--------| |------|--------|
| `8000` | Proxy-Endpunkt (OpenAI-API) | | `8000` | Proxy-Endpunkt (OpenAI- und Anthropic-API) |
| `8001` | Admin-API + Web-Oberfläche | | `8001` | Admin-API + Web-Oberfläche |
Port 8001 muss exposed werden, da der Container die Admin-Oberfläche selbst auf diesem Port ausliefert. Alle API-Endpunkte erfordern das `ADMIN_PASSWORD` — ein Zugriff ohne gültiges Token liefert nur die öffentlichen Frontend-Dateien (HTML/JS/CSS der Login-Seite). Das Passwort ist damit die primäre Schutzmaßnahme. Alle API-Endpunkte erfordern das `ADMIN_PASSWORD` — ein Zugriff ohne gültiges Token liefert nur die öffentlichen Frontend-Dateien (HTML/JS/CSS der Login-Seite). Das Passwort ist damit die primäre Schutzmaßnahme.
Zusätzliche Härtung: Portbindung auf `127.0.0.1` beschränkt den Zugriff auf den lokalen Host und verhindert direkten Netzwerkzugriff:
```
ports:
- "127.0.0.1:8001:8001" # nur lokal erreichbar
# oder:
- "8001:8001" # netzwerkweit, Schutz nur durch ADMIN_PASSWORD
```
## Umgebungsvariablen ## Umgebungsvariablen
@ -38,88 +29,45 @@ ports:
|----------|----------|--------------| |----------|----------|--------------|
| `ADMIN_PASSWORD` | | **Pflicht.** Passwort für die Admin-Oberfläche | | `ADMIN_PASSWORD` | | **Pflicht.** Passwort für die Admin-Oberfläche |
| `OLLAMA_URL` | `http://localhost:11434` | URL des Ollama-Servers (ohne `/v1`-Suffix) | | `OLLAMA_URL` | `http://localhost:11434` | URL des Ollama-Servers (ohne `/v1`-Suffix) |
| `DEFAULT_MODEL` | `llama3` | Modell, das verwendet wird wenn der Client keines angibt |
| `DATABASE_URL` | `sqlite:///./test.db` | Datenbank-Verbindungsstring (SQLite oder PostgreSQL) | | `DATABASE_URL` | `sqlite:///./test.db` | Datenbank-Verbindungsstring (SQLite oder PostgreSQL) |
| `PROXY_HOST` | `0.0.0.0` | Bind-Adresse des Proxy | | `PROXY_HOST` | `0.0.0.0` | Bind-Adresse des Proxy |
| `PROXY_PORT` | `8000` | Port des Proxy | | `PROXY_PORT` | `8000` | Port des Proxy |
| `ADMIN_HOST` | `0.0.0.0` | Bind-Adresse der Admin-API (`127.0.0.1` für lokalen Zugriff) |
| `ADMIN_PORT` | `8001` | Port der Admin-API | | `ADMIN_PORT` | `8001` | Port der Admin-API |
| `APP_TZ` | `Europe/Berlin` | Zeitzone für Tages-/Monats-Reset der Quoten | | `APP_TZ` | `Europe/Berlin` | Zeitzone für Tages-/Monats-Reset der Quoten |
| `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei | | `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei |
| `ANTHROPIC_DEFAULT_MODEL` | | Standard-Modell für `/v1/messages` (Ollama-Modellname, z. B. `llama3`) |
## Docker Compose Ollama extern, SQLite ## Docker Compose Ollama auf dem Host (Linux, empfohlen)
Wenn Ollama außerhalb von Docker läuft — auf dem Docker-Host oder einem anderen erreichbaren Server. `OLLAMA_URL` entsprechend anpassen. `network_mode: host` gibt dem Container direkten Zugriff auf das Host-Netzwerk. Ollama läuft auf dem Host und ist über `localhost:11434` erreichbar — nach außen nicht sichtbar. Proxy und Admin-Oberfläche sind direkt auf den Host-Ports 8000 und 8001 verfügbar.
```yaml ```yaml
services: services:
llmproxy: llmproxy:
image: mediaeng/llmproxy:latest image: mediaeng/llmproxy:latest
container_name: llmproxy
restart: unless-stopped restart: unless-stopped
ports: network_mode: host
- "8000:8000" env_file: .env
- "127.0.0.1:8001:8001"
environment:
ADMIN_PASSWORD: changeme
OLLAMA_URL: http://host.docker.internal:11434 # oder http://<ip>:11434
DEFAULT_MODEL: llama3
APP_TZ: Europe/Berlin
volumes: volumes:
- llmproxy-data:/app/backend - llmproxy-data:/app/backend
# Auf Linux extra_hosts ergänzen, da host.docker.internal dort
# nicht automatisch verfügbar ist:
# extra_hosts:
# - "host.docker.internal:host-gateway"
volumes: volumes:
llmproxy-data: llmproxy-data:
``` ```
## Docker Compose Ollama extern, PostgreSQL `.env`:
```env
```yaml ADMIN_PASSWORD=changeme
services: OLLAMA_URL=http://localhost:11434
llmproxy: APP_TZ=Europe/Berlin
image: mediaeng/llmproxy:latest ANTHROPIC_DEFAULT_MODEL=llama3
restart: unless-stopped
ports:
- "8000:8000"
- "127.0.0.1:8001:8001"
environment:
ADMIN_PASSWORD: changeme
OLLAMA_URL: http://host.docker.internal:11434 # oder http://<ip>:11434
DEFAULT_MODEL: llama3
APP_TZ: Europe/Berlin
DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
volumes:
- llmproxy-data:/app/backend
depends_on:
db:
condition: service_healthy
# extra_hosts:
# - "host.docker.internal:host-gateway"
db:
image: postgres:16-alpine
restart: unless-stopped
environment:
POSTGRES_DB: llmproxy
POSTGRES_USER: llmproxy
POSTGRES_PASSWORD: secret
volumes:
- pg-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U llmproxy"]
interval: 5s
timeout: 5s
retries: 5
volumes:
pg-data:
``` ```
## Docker Compose Ollama als Container, SQLite ## Docker Compose Ollama als Container, SQLite
Ollama und llmproxy laufen gemeinsam in Docker, Daten in einem Volume. Ollama und llmproxy laufen gemeinsam in Docker. Ollama ist nicht nach außen exposed.
```yaml ```yaml
services: services:
@ -128,12 +76,12 @@ services:
restart: unless-stopped restart: unless-stopped
ports: ports:
- "8000:8000" - "8000:8000"
- "127.0.0.1:8001:8001" - "8001:8001"
environment: environment:
ADMIN_PASSWORD: changeme ADMIN_PASSWORD: changeme
OLLAMA_URL: http://ollama:11434 OLLAMA_URL: http://ollama:11434
DEFAULT_MODEL: llama3
APP_TZ: Europe/Berlin APP_TZ: Europe/Berlin
ANTHROPIC_DEFAULT_MODEL: llama3
volumes: volumes:
- llmproxy-data:/app/backend - llmproxy-data:/app/backend
depends_on: depends_on:
@ -161,13 +109,13 @@ services:
restart: unless-stopped restart: unless-stopped
ports: ports:
- "8000:8000" - "8000:8000"
- "127.0.0.1:8001:8001" - "8001:8001"
environment: environment:
ADMIN_PASSWORD: changeme ADMIN_PASSWORD: changeme
OLLAMA_URL: http://ollama:11434 OLLAMA_URL: http://ollama:11434
DEFAULT_MODEL: llama3
APP_TZ: Europe/Berlin APP_TZ: Europe/Berlin
DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
ANTHROPIC_DEFAULT_MODEL: llama3
depends_on: depends_on:
db: db:
condition: service_healthy condition: service_healthy
@ -200,22 +148,21 @@ volumes:
ollama-data: ollama-data:
``` ```
## Schnellstart
```bash
docker run -d \
-p 8000:8000 \
-e ADMIN_PASSWORD=changeme \
-e OLLAMA_URL=http://host.docker.internal:11434 \
-v llmproxy-data:/app/backend \
mediaeng/llmproxy:latest
```
## Client-Konfiguration ## Client-Konfiguration
Den Proxy als OpenAI-kompatibler Endpunkt konfigurieren: **OpenAI-kompatibler Client:**
``` ```
Base URL: http://<host>:8000/v1 Base URL: http://<host>:8000/v1
API Key: <angelegter API-Key aus der Admin-Oberfläche> API Key: <angelegter API-Key aus der Admin-Oberfläche>
``` ```
**Claude Code CLI:**
```bash
ANTHROPIC_BASE_URL=http://<host>:8000 \
ANTHROPIC_AUTH_TOKEN=<API-Key> \
claude
```
## Lizenz
MIT — © 2026 Oliver Hofmann. Details siehe [LICENSE](https://git.efi.th-nuernberg.de/gitea/hofmannol/llmproxy/src/branch/main/LICENSE).

View File

@ -6,6 +6,8 @@ COPY frontend/ frontend/
RUN npm run build --prefix frontend RUN npm run build --prefix frontend
FROM python:3.12-slim FROM python:3.12-slim
ARG APP_VERSION=dev
ENV APP_VERSION=$APP_VERSION
WORKDIR /app WORKDIR /app
COPY backend/requirements.txt . COPY backend/requirements.txt .

204
KURZANLEITUNG.md Normal file
View File

@ -0,0 +1,204 @@
# LLM-Dienst Kurzanleitung
## Worum geht es?
Der Dienst stellt **große Sprachmodelle (LLMs)** über eine einfache HTTP-API bereit, die direkt aus Python-Skripten, Jupyter-Notebooks oder eigenen Anwendungen angesprochen werden kann. Die Modelle laufen lokal auf einem GPU-Server im Intranet ohne Datenübertragung nach außen und ohne Cloud-Kosten.
Typische Anwendungsfälle:
- Texte zusammenfassen, übersetzen oder umformulieren
- KI-gestütztes Coding (z.B. mit **[opencode](https://opencode.ai)**)
- Experimente mit Prompt-Engineering und LLM-Integration in eigene Projekte
---
## Zugang
Der Dienst ist **nur im Intranet (VPN)** erreichbar.
| | |
|---|---|
| **API-Endpunkt** | `http://141.75.33.244:8000` |
| **Authentifizierung** | API-Key erforderlich (per E-Mail beim Admin anfragen) |
---
## Verfügbare Modelle
| Modell | Größe | Hinweis |
|---|---|---|
| `gemma4:e4b` | 9,6 GB | sehr schnell, für einfache Aufgaben |
| `gemma4:31b` | 19 GB | kompakt, schnell |
| `gpt-oss:20b` | 13 GB | kompakt, schnell |
| `gpt-oss:120b` | 65 GB | sehr leistungsfähig |
| `qwen3.5:122b` | 81 GB | sehr leistungsfähig |
| `qwen3-coder-next:q8_0` | 84 GB | speziell für Code |
> **Wichtig:** Es kann immer nur **ein Modell gleichzeitig** im GPU-Speicher geladen sein.
> Wechselt jemand das Modell, muss das vorherige entladen und das neue geladen werden
> das kann **mehrere Minuten** dauern. Der erste Prompt nach einem Modellwechsel ist
> deshalb deutlich langsamer. Danach bleibt das Modell einige Zeit geladen.
---
## Python-Beispiel Einfacher Prompt
Das API folgt dem **OpenAI-Standard**, d.h. die `openai`-Bibliothek kann direkt verwendet werden.
```bash
pip install openai
```
```python
from openai import OpenAI
API_KEY = "sk-..." # euren API-Key eintragen
BASE_URL = "http://141.75.33.244:8000/v1"
MODEL = "gemma4:31b" # Modell nach Bedarf wählen
client = OpenAI(api_key=API_KEY, base_url=BASE_URL)
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "user", "content": "Erkläre den Unterschied zwischen L1- und L2-Regularisierung."}
]
)
print(response.choices[0].message.content)
```
---
## Python-Beispiel Verfügbare Modelle abfragen
```python
from openai import OpenAI
API_KEY = "sk-..."
BASE_URL = "http://141.75.33.244:8000/v1"
client = OpenAI(api_key=API_KEY, base_url=BASE_URL)
models = client.models.list()
for m in models.data:
print(m.id)
```
---
## Aktuell geladenes Modell abfragen
Da immer nur ein Modell gleichzeitig im Speicher sein kann, lässt sich mit folgendem Aufruf prüfen, welches Modell gerade aktiv ist:
```python
import httpx
r = httpx.get(
"http://141.75.33.244:8000/api/ps",
headers={"Authorization": "Bearer sk-..."}
)
print(r.json())
```
Die Antwort enthält Modellname, Größe und wie lange das Modell noch im Speicher bleibt.
---
## Empfehlungen zur Nutzung
- **Kleines Modell zuerst** (`gemma4:31b` oder `gpt-oss:20b`) viel schneller, für viele Aufgaben ausreichend.
- **Großes Modell** nur bei komplexen Aufgaben (`qwen3.5:122b`, `gpt-oss:120b`).
- **Code-Aufgaben**: `qwen3-coder-next:q8_0` ist speziell dafür optimiert.
- Wenn möglich, **dasselbe Modell wie andere Nutzer** verwenden, um häufige Modellwechsel zu vermeiden.
---
## Quotas
Je nach API-Key können folgende Limits konfiguriert sein:
- Maximale **Anfragen pro Tag / Monat**
- Maximale **Tokens pro Tag / Monat**
Bei Überschreitung gibt die API den Statuscode `429 Too Many Requests` zurück.
---
## Coding-Assistent: opencode
[opencode](https://opencode.ai) ist ein terminal-basierter KI-Coding-Agent (ähnlich Claude Code), der OpenAI-kompatible APIs unterstützt und damit direkt auf den Intranet-Dienst zeigen kann.
### Installation
```bash
npm install -g opencode-ai
# oder
curl -fsSL https://opencode.ai/install | bash
```
### Konfiguration
Konfigurationsdatei anlegen unter `~/.config/opencode/config.json`:
```json
{
"$schema": "https://opencode.ai/config.json",
"providers": {
"openai": {
"apiKey": "sk-...",
"baseURL": "http://141.75.33.244:8000/v1"
}
},
"model": "openai/qwen3-coder-next:q8_0"
}
```
Für Code-Aufgaben empfiehlt sich `qwen3-coder-next:q8_0`, für allgemeine Aufgaben `gemma4:31b` oder `gpt-oss:20b`.
### Starten
```bash
opencode
```
opencode öffnet eine interaktive Terminal-Oberfläche und kann dann im Projektverzeichnis eingesetzt werden Dateien lesen, Code generieren, Refactoring vorschlagen usw.
---
## Coding-Assistent: Claude Code
[Claude Code](https://claude.ai/code) ist Anthropics offizieller KI-Coding-Agent für das Terminal. Wer bereits einen Claude-Code-Zugang hat, kann ihn über den Intranet-Dienst mit lokalen Modellen betreiben — ohne Daten an Anthropic zu übertragen.
### Voraussetzung
Ein aktiver Claude-Code-Zugang (Claude Pro oder Team).
### Starten
```bash
ANTHROPIC_BASE_URL=http://141.75.33.244:8000 \
ANTHROPIC_AUTH_TOKEN=sk-... \
claude
```
Das zu verwendende Modell wird vom Admin über `ANTHROPIC_DEFAULT_MODEL` vorkonfiguriert — eine manuelle Modellauswahl ist nicht nötig.
---
## Administration (nur für Admins)
Das Web-Interface zur Verwaltung von API-Keys und Quotas ist erreichbar unter:
**`http://141.75.33.244:8001`**
Dort können API-Keys angelegt, deaktiviert und mit Quotas versehen werden.
### Modell-Lock für Praktika
Unter **Einstellungen → Aktives Modell (Lock)** kann ein Modell fest vorgegeben werden. Ist ein Lock gesetzt, wird das `model`-Feld in jedem Request durch dieses Modell ersetzt unabhängig davon, was der Client schickt. Das verhindert unkoordinierte Modellwechsel während einer Veranstaltung, die alle Teilnehmenden durch lange Ladezeiten ausbremsen würden.
Typischer Ablauf für ein Praktikum:
1. Vor der Veranstaltung: passendes Modell in Ollama laden
2. Lock in der Admin-Oberfläche aktivieren
3. Nach der Veranstaltung: Lock wieder deaktivieren (Feld leeren)

27
LICENSE Normal file
View File

@ -0,0 +1,27 @@
MIT License
Copyright (c) 2026 Oliver Hofmann
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
---
Portions of this software were inspired by free-claude-code
(https://github.com/Alishahryar1/free-claude-code),
copyright (c) 2026 Ali Khokhar, MIT License.

127
README.md
View File

@ -1,15 +1,16 @@
# Ollama Proxy mit API-Keys und Quotas # Ollama Proxy mit API-Keys und Quotas
Ein Reverse-Proxy für Ollama mit API-Key-Authentifizierung, Quota-Management und Web-Admin-Oberfläche. Ollama bietet von sich aus keine Authentifizierung — wer die API erreicht, kann sie nutzen. Dieses Projekt löst das Problem: Ollama bleibt an `localhost` gebunden und ist von außen nicht erreichbar. Vorgeschaltet läuft ein Proxy (Port 8000), der jeden Request auf einen gültigen API-Key prüft und optional Token- sowie Request-Quoten pro Key durchsetzt. Eine Web-Admin-Oberfläche (Port 8001) erlaubt das Verwalten von Keys, Quoten und Ollama-Einstellungen.
## Features ## Features
- API-Key-Authentifizierung (Bearer Token oder `sk-`-Prefix) - API-Key-Authentifizierung (Bearer Token, `sk-`-Prefix, `x-api-key`- und `anthropic-auth-token`-Header)
- Optionales Ablaufdatum pro API-Key - Optionales Ablaufdatum pro API-Key
- Quota-Management mit getrennten Tages- und Monatslimits (Tokens & Requests) - Quota-Management mit getrennten Tages- und Monatslimits (Tokens & Requests)
- Token-Zählung via tiktoken, Reset-Grenzen in der Zeitzone Europe/Berlin - Token-Zählung via tiktoken, Reset-Grenzen in der konfigurierten Zeitzone
- Web-Admin-Oberfläche (API-Keys verwalten, Ollama-Einstellungen, Verbrauchsanzeige) - Web-Admin-Oberfläche (API-Keys verwalten, Ollama-Einstellungen, Verbrauchsanzeige)
- OpenAI-kompatibler `/v1/chat/completions`-Endpunkt mit Streaming und Tool-Use - OpenAI-kompatibler `/v1/chat/completions`-Endpunkt mit Streaming und Tool-Use
- Anthropic Messages API `/v1/messages` — kompatibel mit Claude Code CLI und Anthropic-SDK-Clients
- Rotierende Nutzungs-Logs - Rotierende Nutzungs-Logs
- SQLite (Standard) oder PostgreSQL - SQLite (Standard) oder PostgreSQL
- Docker-Image auf DockerHub: `mediaeng/llmproxy` - Docker-Image auf DockerHub: `mediaeng/llmproxy`
@ -19,7 +20,7 @@ Ein Reverse-Proxy für Ollama mit API-Key-Authentifizierung, Quota-Management un
- Admin-Oberfläche passwortgeschützt (`ADMIN_PASSWORD`) — alle API-Endpunkte erfordern den Token - Admin-Oberfläche passwortgeschützt (`ADMIN_PASSWORD`) — alle API-Endpunkte erfordern den Token
- API-Keys als SHA-256-Hash in der DB — Plaintext nur einmalig bei Erstellung - API-Keys als SHA-256-Hash in der DB — Plaintext nur einmalig bei Erstellung
- Quota-Check atomar mit `SELECT FOR UPDATE` (kein TOCTOU-Race) - Quota-Check atomar mit `SELECT FOR UPDATE` (kein TOCTOU-Race)
- Port 8001 kann optional auf `127.0.0.1` gebunden werden (zusätzliche Härtung) - Admin-Port 8001 über `ADMIN_HOST=127.0.0.1` auf lokalen Zugriff beschränkbar
## Konfiguration ## Konfiguration
@ -29,12 +30,13 @@ Ein Reverse-Proxy für Ollama mit API-Key-Authentifizierung, Quota-Management un
ADMIN_PASSWORD=change-me ADMIN_PASSWORD=change-me
PROXY_HOST=0.0.0.0 PROXY_HOST=0.0.0.0
PROXY_PORT=8000 PROXY_PORT=8000
ADMIN_HOST=0.0.0.0
ADMIN_PORT=8001 ADMIN_PORT=8001
DATABASE_URL=sqlite:///./test.db DATABASE_URL=sqlite:///./test.db
OLLAMA_URL=http://localhost:11434 OLLAMA_URL=http://localhost:11434
DEFAULT_MODEL=llama3
APP_TZ=Europe/Berlin APP_TZ=Europe/Berlin
LOG_FILE=logs/usage.log LOG_FILE=logs/usage.log
ANTHROPIC_DEFAULT_MODEL=llama3
``` ```
| Variable | Standard | Beschreibung | | Variable | Standard | Beschreibung |
@ -42,13 +44,14 @@ LOG_FILE=logs/usage.log
| `ADMIN_PASSWORD` | — | Passwort für die Admin-Oberfläche (**Pflicht**) | | `ADMIN_PASSWORD` | — | Passwort für die Admin-Oberfläche (**Pflicht**) |
| `PROXY_HOST` | `0.0.0.0` | Bind-Adresse des Proxys | | `PROXY_HOST` | `0.0.0.0` | Bind-Adresse des Proxys |
| `PROXY_PORT` | `8000` | Port des Proxys | | `PROXY_PORT` | `8000` | Port des Proxys |
| `ADMIN_HOST` | `0.0.0.0` | Bind-Adresse der Admin-API (z. B. `127.0.0.1` für lokalen Zugriff) |
| `ADMIN_PORT` | `8001` | Port der Admin-API | | `ADMIN_PORT` | `8001` | Port der Admin-API |
| `DATABASE_URL` | `sqlite:///./test.db` | DB-Verbindungsstring (SQLite oder PostgreSQL) | | `DATABASE_URL` | `sqlite:///./test.db` | DB-Verbindungsstring (SQLite oder PostgreSQL) |
| `OLLAMA_URL` | `http://localhost:11434` | Adresse der Ollama-Instanz (auch in der UI änderbar) | | `OLLAMA_URL` | `http://localhost:11434` | Adresse der Ollama-Instanz (auch in der UI änderbar) |
| `DEFAULT_MODEL` | `llama3` | Standard-Modell für `/v1/chat/completions` (auch in der UI änderbar) |
| `APP_TZ` | `Europe/Berlin` | Zeitzone für tägliche/monatliche Quota-Resets | | `APP_TZ` | `Europe/Berlin` | Zeitzone für tägliche/monatliche Quota-Resets |
| `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei | | `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei |
| `ALLOWED_ORIGINS` | `http://localhost:5173` | CORS-Origins (nur für Entwicklung relevant) | | `ALLOWED_ORIGINS` | `http://localhost:5173` | CORS-Origins (nur für Entwicklung relevant) |
| `ANTHROPIC_DEFAULT_MODEL` | — | Standard-Modell für `/v1/messages` (Ollama-Modellname) |
## Entwicklung (lokal) ## Entwicklung (lokal)
@ -78,6 +81,23 @@ Das Script prüft alle Ports auf Belegung, initialisiert die Datenbank und start
Admin-Oberfläche: `http://localhost:5173` Admin-Oberfläche: `http://localhost:5173`
## Claude Code CLI
Der Proxy stellt einen Anthropic-kompatiblen Endpunkt bereit, über den Claude Code CLI mit lokalen Ollama-Modellen genutzt werden kann.
```bash
# ANTHROPIC_DEFAULT_MODEL in .env setzen, dann:
./start_claude.sh
# Oder mit Key als Argument:
./start_claude.sh sk-dein-api-key
# Oder als Umgebungsvariable:
PROXY_API_KEY=sk-dein-api-key ./start_claude.sh
```
Das Script setzt `ANTHROPIC_BASE_URL` und `ANTHROPIC_AUTH_TOKEN` automatisch aus der `.env` und startet `claude`.
## Produktion (Docker) ## Produktion (Docker)
### Docker Compose (empfohlen) ### Docker Compose (empfohlen)
@ -86,7 +106,13 @@ Admin-Oberfläche: `http://localhost:5173`
docker compose up -d docker compose up -d
``` ```
Zieht das Image von DockerHub, lädt Variablen aus `.env` und verwendet die lokale SQLite-Datenbank. Weitere Compose-Varianten (PostgreSQL, Ollama als Container) siehe `DOCKERHUB.md`. Zieht das Image von DockerHub und lädt Variablen aus `.env`.
Das Setup verwendet `network_mode: host`: Der Container teilt den Netzwerkstack des Hosts, statt ein eigenes virtuelles Netzwerk zu bekommen. Das ist hier aus zwei Gründen die richtige Wahl:
1. **Ollama soll nicht von außen erreichbar sein.** Ollama läuft auf dem Host und ist an `127.0.0.1:11434` gebunden — nur lokal erreichbar. Mit einem eigenen Container-Netzwerk (Bridge-Mode) wäre `localhost` aus Sicht des Containers der Container selbst, nicht der Host. Die übliche Alternative (`host.docker.internal` + `extra_hosts`) ist auf Linux unzuverlässig.
2. **Kein doppeltes Port-Mapping nötig.** Mit `network_mode: host` sind Port 8000 und 8001 direkt auf dem Host verfügbar, ohne `ports:`-Einträge in der Compose-Datei.
### Image selbst bauen und pushen ### Image selbst bauen und pushen
@ -98,28 +124,90 @@ Das Script zeigt den aktuellen Git-Tag, bietet an einen neuen zu setzen, baut da
### Port 8001 (Admin) ### Port 8001 (Admin)
Port 8001 muss exposed werden, da der Container die Admin-Oberfläche auf diesem Port ausliefert. Alle API-Endpunkte erfordern das `ADMIN_PASSWORD` — der Token ist der primäre Schutz. Optionale zusätzliche Härtung: Bindung auf `127.0.0.1`: Alle Admin-Endpunkte erfordern das `ADMIN_PASSWORD` — der Token ist der primäre Schutz. Für zusätzliche Härtung lässt sich die Admin-API auf lokalen Zugriff beschränken:
```yaml ```env
ports: ADMIN_HOST=127.0.0.1
- "127.0.0.1:8001:8001" # nur lokal
# oder:
- "8001:8001" # netzwerkweit, Schutz durch ADMIN_PASSWORD
``` ```
Bei `network_mode: host` (Produktions-Standard) ist das die einzig wirksame Methode — Docker-Port-Mapping greift dort nicht.
### HTTPS via Reverse-Proxy (ungetestet)
Wer Proxy und Admin-Oberfläche per HTTPS bereitstellen will, kann einen weiteren Reverse-Proxy (z. B. Nginx oder Caddy) vorschalten. Bei `network_mode: host` lauschen beide Dienste direkt auf dem Host, Nginx/Caddy proxyen auf `localhost`.
**Caddy** (empfohlen — automatisches TLS via Let's Encrypt):
```
llm.example.com {
reverse_proxy localhost:8000 {
flush_interval -1
}
}
llm-admin.example.com {
reverse_proxy localhost:8001
}
```
**Nginx** (mit Certbot-Zertifikaten):
```nginx
server {
listen 443 ssl;
server_name llm.example.com;
ssl_certificate /etc/letsencrypt/live/llm.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/llm.example.com/privkey.pem;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_buffering off; # nötig für Streaming
proxy_cache off;
}
}
server {
listen 443 ssl;
server_name llm-admin.example.com;
ssl_certificate /etc/letsencrypt/live/llm-admin.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/llm-admin.example.com/privkey.pem;
location / {
proxy_pass http://127.0.0.1:8001;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
```
Clients konfigurieren dann `https://llm.example.com/v1` als Base URL.
## Proxy-Endpunkte (Port 8000) ## Proxy-Endpunkte (Port 8000)
Alle Endpunkte erfordern einen gültigen API-Key im `Authorization`-Header. Alle Endpunkte erfordern einen gültigen API-Key im `Authorization`-Header (`Bearer sk-...`), im `x-api-key`-Header oder im `anthropic-auth-token`-Header.
```bash ```bash
# OpenAI-kompatibler Endpunkt
curl -X POST http://localhost:8000/v1/chat/completions \ curl -X POST http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer sk-xxxxxx" \ -H "Authorization: Bearer sk-xxxxxx" \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-d '{"model":"llama3","messages":[{"role":"user","content":"Hallo"}]}' -d '{"model":"llama3","messages":[{"role":"user","content":"Hallo"}]}'
# Anthropic-kompatibler Endpunkt (z. B. für Claude Code)
curl -X POST http://localhost:8000/v1/messages \
-H "x-api-key: sk-xxxxxx" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{"model":"llama3","messages":[{"role":"user","content":"Hallo"}],"max_tokens":1024}'
``` ```
| Endpunkt | Methode | Beschreibung | | Endpunkt | Methode | Beschreibung |
|----------|---------|--------------| |----------|---------|--------------|
| `/v1/messages` | POST | Chat (Anthropic-Format, Streaming + Tool-Use) |
| `/v1/chat/completions` | POST | Chat (OpenAI-Format, Streaming + Tool-Use) | | `/v1/chat/completions` | POST | Chat (OpenAI-Format, Streaming + Tool-Use) |
| `/v1/models` | GET | Modelle (OpenAI-Format) | | `/v1/models` | GET | Modelle (OpenAI-Format) |
| `/api/generate` | POST | Ollama generate (nativ) | | `/api/generate` | POST | Ollama generate (nativ) |
@ -167,7 +255,8 @@ llm_quota/
│ └── tests/ │ └── tests/
│ ├── conftest.py │ ├── conftest.py
│ ├── test_auth.py │ ├── test_auth.py
│ └── test_quota.py │ ├── test_quota.py
│ └── test_anthropic_messages.py
├── frontend/ ├── frontend/
│ └── src/ │ └── src/
│ ├── main.jsx # React-Admin-UI │ ├── main.jsx # React-Admin-UI
@ -179,13 +268,19 @@ llm_quota/
├── docker-entrypoint.sh ├── docker-entrypoint.sh
├── .dockerignore ├── .dockerignore
├── start.sh # Entwicklungs-Startscript ├── start.sh # Entwicklungs-Startscript
├── start_claude.sh # Claude Code CLI mit Proxy starten
├── run_dev.py # Entwicklungs-Runner für PyCharm ├── run_dev.py # Entwicklungs-Runner für PyCharm
├── build_push.sh # Docker-Build & Push zu DockerHub ├── build_push.sh # Docker-Build & Push zu DockerHub
├── LICENSE
├── DOCKERHUB.md # DockerHub-Beschreibung (deutsch) ├── DOCKERHUB.md # DockerHub-Beschreibung (deutsch)
├── DOCKERHUB.en.md # DockerHub-Beschreibung (englisch) ├── DOCKERHUB.en.md # DockerHub-Beschreibung (englisch)
└── .gitignore └── .gitignore
``` ```
## Danksagung
Der Anthropic-kompatible Endpunkt (`/v1/messages`) wurde durch das Projekt [free-claude-code](https://github.com/Alishahryar1/free-claude-code) von Ali Khokhar inspiriert, das einen ähnlichen Ansatz für das Weiterleiten von Claude-Code-Anfragen an alternative LLM-Backends verfolgt.
## Lizenz ## Lizenz
MIT MIT — siehe [LICENSE](LICENSE)

View File

@ -131,13 +131,16 @@ async def get_proxy_info(_ = Depends(require_admin_auth)):
host = os.getenv("PROXY_HOST", "0.0.0.0") host = os.getenv("PROXY_HOST", "0.0.0.0")
port = os.getenv("PROXY_PORT", "8000") port = os.getenv("PROXY_PORT", "8000")
display_host = "localhost" if host in ("0.0.0.0", "::") else host display_host = "localhost" if host in ("0.0.0.0", "::") else host
return {"endpoint": f"http://{display_host}:{port}"} return {
"endpoint": f"http://{display_host}:{port}",
"version": os.getenv("APP_VERSION", "dev"),
}
@app.get("/api/settings", response_model=schemas.Settings) @app.get("/api/settings", response_model=schemas.Settings)
async def read_settings(db: Session = Depends(get_db), _ = Depends(require_admin_auth)): async def read_settings(db: Session = Depends(get_db), _ = Depends(require_admin_auth)):
return schemas.Settings( return schemas.Settings(
ollama_url=crud.get_setting(db, "ollama_url", "http://localhost:11434"), ollama_url=crud.get_setting(db, "ollama_url", "http://localhost:11434"),
default_model=crud.get_setting(db, "default_model", "llama3"), force_model=crud.get_setting(db, "force_model") or None,
) )
@app.put("/api/settings", response_model=schemas.Settings) @app.put("/api/settings", response_model=schemas.Settings)
@ -148,8 +151,8 @@ async def update_settings(
): ):
ollama_url = settings.ollama_url.rstrip('/').removesuffix('/v1') ollama_url = settings.ollama_url.rstrip('/').removesuffix('/v1')
crud.set_setting(db, "ollama_url", ollama_url) crud.set_setting(db, "ollama_url", ollama_url)
crud.set_setting(db, "default_model", settings.default_model) crud.set_setting(db, "force_model", settings.force_model or "")
return schemas.Settings(ollama_url=ollama_url, default_model=settings.default_model) return schemas.Settings(ollama_url=ollama_url, force_model=settings.force_model or None)
@app.get("/api/ollama-models") @app.get("/api/ollama-models")
async def get_ollama_models( async def get_ollama_models(
@ -162,9 +165,21 @@ async def get_ollama_models(
async with httpx.AsyncClient(timeout=5.0) as client: async with httpx.AsyncClient(timeout=5.0) as client:
response = await client.get(f"{ollama_url}/api/tags") response = await client.get(f"{ollama_url}/api/tags")
models = [m["name"] for m in response.json().get("models", [])] models = [m["name"] for m in response.json().get("models", [])]
return {"models": models, "reachable": True}
except Exception: except Exception:
models = [] return {"models": [], "reachable": False}
return {"models": models}
@app.get("/api/logs/{name}")
async def get_log_lines(name: str, _ = Depends(require_admin_auth)):
if name not in ("usage", "error"):
raise HTTPException(status_code=400, detail="name must be 'usage' or 'error'")
log_file = Path(os.getenv("LOG_FILE", "logs/usage.log"))
path = log_file if name == "usage" else log_file.parent / "error.log"
try:
lines = path.read_text(encoding="utf-8").splitlines()
return {"lines": lines[-10:]}
except FileNotFoundError:
return {"lines": []}
# Statisches Frontend ausliefern (nur im Produktivbetrieb, wenn dist/ existiert) # Statisches Frontend ausliefern (nur im Produktivbetrieb, wenn dist/ existiert)
_dist = Path(__file__).parent.parent / "frontend" / "dist" _dist = Path(__file__).parent.parent / "frontend" / "dist"

View File

@ -1,12 +1,20 @@
import os import os
from pathlib import Path
from dotenv import load_dotenv from dotenv import load_dotenv
from sqlalchemy import create_engine from sqlalchemy import create_engine
load_dotenv(dotenv_path=os.path.join(os.path.dirname(__file__), '..', '.env')) load_dotenv(dotenv_path=Path(__file__).resolve().parent.parent / ".env")
from sqlalchemy.orm import sessionmaker, declarative_base from sqlalchemy.orm import sessionmaker, declarative_base
DATABASE_URL = os.environ.get("DATABASE_URL", "sqlite:///./test.db") DATABASE_URL = os.environ.get("DATABASE_URL", "sqlite:///./test.db")
# Relative SQLite-Pfade immer relativ zu dieser Datei auflösen, nicht zum cwd
if DATABASE_URL.startswith("sqlite:///") and not DATABASE_URL.startswith("sqlite:////"):
db_path = DATABASE_URL[len("sqlite:///"):]
if not os.path.isabs(db_path):
db_path = str(Path(__file__).resolve().parent / db_path)
DATABASE_URL = f"sqlite:///{db_path}"
if "sqlite" in DATABASE_URL: if "sqlite" in DATABASE_URL:
engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False}) engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
else: else:

View File

@ -13,8 +13,6 @@ def init_db():
db = SessionLocal() db = SessionLocal()
if not get_setting(db, "ollama_url"): if not get_setting(db, "ollama_url"):
set_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434")) set_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
if not get_setting(db, "default_model"):
set_setting(db, "default_model", os.getenv("DEFAULT_MODEL", "llama3"))
db.close() db.close()
print("Database initialized.") print("Database initialized.")

View File

@ -1,5 +1,8 @@
import json
import logging import logging
import os import os
import secrets
import time
from logging.handlers import RotatingFileHandler from logging.handlers import RotatingFileHandler
from pathlib import Path from pathlib import Path
@ -10,29 +13,55 @@ from database import get_db
import crud import crud
import httpx import httpx
_log_dir = Path(os.getenv("LOG_FILE", "logs/usage.log")).parent
_log_dir.mkdir(parents=True, exist_ok=True)
_fmt = logging.Formatter("%(asctime)s | %(message)s", datefmt="%Y-%m-%d %H:%M:%S")
# Rotating usage log (8 KB per file, 3 backups) # Rotating usage log (8 KB per file, 3 backups)
_log_path = Path(os.getenv("LOG_FILE", "logs/usage.log")) _usage_handler = RotatingFileHandler(str(_log_dir / "usage.log"), maxBytes=8192, backupCount=3, encoding="utf-8")
_log_path.parent.mkdir(parents=True, exist_ok=True) _usage_handler.setFormatter(_fmt)
_handler = RotatingFileHandler(str(_log_path), maxBytes=8192, backupCount=3, encoding="utf-8")
_handler.setFormatter(logging.Formatter("%(asctime)s | %(message)s", datefmt="%Y-%m-%d %H:%M:%S"))
usage_log = logging.getLogger("proxy.usage") usage_log = logging.getLogger("proxy.usage")
usage_log.setLevel(logging.INFO) usage_log.setLevel(logging.INFO)
usage_log.addHandler(_handler) usage_log.addHandler(_usage_handler)
usage_log.propagate = False usage_log.propagate = False
# Rotating error log (64 KB per file, 5 backups)
_error_handler = RotatingFileHandler(str(_log_dir / "error.log"), maxBytes=65536, backupCount=5, encoding="utf-8")
_error_handler.setFormatter(_fmt)
error_log = logging.getLogger("proxy.error")
error_log.setLevel(logging.ERROR)
error_log.addHandler(_error_handler)
error_log.propagate = False
def _content_to_str(content) -> str:
"""Normalize OpenAI content: string or array of content parts → plain string."""
if isinstance(content, list):
return " ".join(
part.get("text", "") if isinstance(part, dict) else str(part)
for part in content
)
return content or ""
def _last_user_msg(messages: list, max_len: int = 120) -> str: def _last_user_msg(messages: list, max_len: int = 120) -> str:
for msg in reversed(messages): for msg in reversed(messages):
if msg.get("role") == "user": if msg.get("role") == "user":
text = (msg.get("content") or "").replace("\n", " ").strip() text = _content_to_str(msg.get("content")).replace("\n", " ").strip()
return text[:max_len] + ("" if len(text) > max_len else "") return text[:max_len] + ("" if len(text) > max_len else "")
return "" return ""
async def require_api_key(request: Request, db: Session = Depends(get_db)): async def require_api_key(request: Request, db: Session = Depends(get_db)):
auth_header = request.headers.get("Authorization", "") auth_header = request.headers.get("Authorization", "")
x_api_key = request.headers.get("x-api-key", "")
auth_token = request.headers.get("anthropic-auth-token", "")
if auth_header.startswith("Bearer "): if auth_header.startswith("Bearer "):
api_key = auth_header[7:] api_key = auth_header[7:]
elif auth_header.startswith("sk-"): elif auth_header.startswith("sk-"):
api_key = auth_header api_key = auth_header
elif x_api_key:
api_key = x_api_key
elif auth_token:
api_key = auth_token
else: else:
raise HTTPException(status_code=401, detail="Invalid or missing API key") raise HTTPException(status_code=401, detail="Invalid or missing API key")
db_key = crud.verify_api_key(db, api_key) db_key = crud.verify_api_key(db, api_key)
@ -43,15 +72,42 @@ async def require_api_key(request: Request, db: Session = Depends(get_db)):
app = FastAPI(title="Ollama Proxy", dependencies=[Depends(require_api_key)]) app = FastAPI(title="Ollama Proxy", dependencies=[Depends(require_api_key)])
@app.on_event("startup")
def apply_env_settings():
"""Write env-configured values into DB so they take effect until next restart."""
db = next(get_db())
try:
if url := os.getenv("OLLAMA_URL"):
crud.set_setting(db, "ollama_url", url)
db.commit()
finally:
db.close()
@app.exception_handler(Exception)
async def unhandled_exception_handler(request: Request, exc: Exception):
error_log.error("Unhandled exception | %s %s | %s: %s",
request.method, request.url.path, type(exc).__name__, exc, exc_info=exc)
return JSONResponse(status_code=500, content={"error": {"message": "Internal server error", "type": "server_error"}})
def _backend_headers() -> dict:
key = os.getenv("BACKEND_API_KEY")
return {"Authorization": f"Bearer {key}"} if key else {}
async def proxy_request(url: str, method: str = "GET", json_data: dict = None): async def proxy_request(url: str, method: str = "GET", json_data: dict = None):
async with httpx.AsyncClient(timeout=300.0) as client: async with httpx.AsyncClient(timeout=300.0) as client:
response = await client.request(method=method, url=url, json=json_data) response = await client.request(method=method, url=url, json=json_data, headers=_backend_headers())
return response return response
@app.post("/api/generate") @app.post("/api/generate")
async def generate(request: Request, db: Session = Depends(get_db)): async def generate(request: Request, db: Session = Depends(get_db)):
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434")) ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
body = await request.json() body = await request.json()
force_model = crud.get_setting(db, "force_model") or None
if force_model:
body = {**body, "model": force_model}
if not body.get("model"):
raise HTTPException(status_code=422, detail="Field 'model' is required")
prompt_tokens = crud.count_tokens(body.get("prompt", "")) prompt_tokens = crud.count_tokens(body.get("prompt", ""))
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1): if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
@ -60,23 +116,50 @@ async def generate(request: Request, db: Session = Depends(get_db)):
prompt_preview = (body.get("prompt", "").replace("\n", " ").strip())[:120] prompt_preview = (body.get("prompt", "").replace("\n", " ").strip())[:120]
usage_log.info('%s | /api/generate | %s | ~%d tokens | "%s"', usage_log.info('%s | /api/generate | %s | ~%d tokens | "%s"',
request.state.api_key_name, body.get("model", "?"), prompt_tokens, prompt_preview) request.state.api_key_name, body.get("model", "?"), prompt_tokens, prompt_preview)
start = time.monotonic()
try:
response = await proxy_request(f"{ollama_url}/api/generate", method="POST", json_data=body) response = await proxy_request(f"{ollama_url}/api/generate", method="POST", json_data=body)
return JSONResponse(content=response.json(), status_code=response.status_code) resp_json = response.json()
usage_log.info('%s | /api/generate | %s | actual ↑%d%d tokens | %.1fs',
request.state.api_key_name, body.get("model", "?"),
resp_json.get("prompt_eval_count", 0), resp_json.get("eval_count", 0),
time.monotonic() - start)
return JSONResponse(content=resp_json, status_code=response.status_code)
except Exception as exc:
error_log.error("Proxy error | %s | /api/generate | %s | %s: %s",
request.state.api_key_name, body.get("model", "?"), type(exc).__name__, exc, exc_info=exc)
raise
@app.post("/api/chat") @app.post("/api/chat")
async def chat(request: Request, db: Session = Depends(get_db)): async def chat(request: Request, db: Session = Depends(get_db)):
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434")) ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
body = await request.json() body = await request.json()
force_model = crud.get_setting(db, "force_model") or None
if force_model:
body = {**body, "model": force_model}
if not body.get("model"):
raise HTTPException(status_code=422, detail="Field 'model' is required")
messages = body.get("messages", []) messages = body.get("messages", [])
prompt_tokens = sum(crud.count_tokens(msg.get("content") or "") for msg in messages) prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages)
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1): if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
raise HTTPException(status_code=429, detail="Quota exceeded") raise HTTPException(status_code=429, detail="Quota exceeded")
usage_log.info('%s | /api/chat | %s | ~%d tokens | "%s"', usage_log.info('%s | /api/chat | %s | ~%d tokens | "%s"',
request.state.api_key_name, body.get("model", "?"), prompt_tokens, _last_user_msg(messages)) request.state.api_key_name, body.get("model", "?"), prompt_tokens, _last_user_msg(messages))
start = time.monotonic()
try:
response = await proxy_request(f"{ollama_url}/api/chat", method="POST", json_data=body) response = await proxy_request(f"{ollama_url}/api/chat", method="POST", json_data=body)
return JSONResponse(content=response.json(), status_code=response.status_code) resp_json = response.json()
usage_log.info('%s | /api/chat | %s | actual ↑%d%d tokens | %.1fs',
request.state.api_key_name, body.get("model", "?"),
resp_json.get("prompt_eval_count", 0), resp_json.get("eval_count", 0),
time.monotonic() - start)
return JSONResponse(content=resp_json, status_code=response.status_code)
except Exception as exc:
error_log.error("Proxy error | %s | /api/chat | %s | %s: %s",
request.state.api_key_name, body.get("model", "?"), type(exc).__name__, exc, exc_info=exc)
raise
@app.get("/api/tags") @app.get("/api/tags")
async def list_models(db: Session = Depends(get_db)): async def list_models(db: Session = Depends(get_db)):
@ -84,12 +167,226 @@ async def list_models(db: Session = Depends(get_db)):
response = await proxy_request(f"{ollama_url}/api/tags", method="GET") response = await proxy_request(f"{ollama_url}/api/tags", method="GET")
return JSONResponse(content=response.json(), status_code=response.status_code) return JSONResponse(content=response.json(), status_code=response.status_code)
@app.get("/version")
async def version():
return {"version": os.getenv("APP_VERSION", "dev")}
@app.get("/api/ps")
async def running_models(db: Session = Depends(get_db)):
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
response = await proxy_request(f"{ollama_url}/api/ps", method="GET")
return JSONResponse(content=response.json(), status_code=response.status_code)
@app.get("/api/versions") @app.get("/api/versions")
async def versions(db: Session = Depends(get_db)): async def versions(db: Session = Depends(get_db)):
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434")) ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
response = await proxy_request(f"{ollama_url}/api/versions", method="GET") response = await proxy_request(f"{ollama_url}/api/versions", method="GET")
return JSONResponse(content=response.json(), status_code=response.status_code) return JSONResponse(content=response.json(), status_code=response.status_code)
# --- Anthropic Messages API compatibility layer ---
def _anthropic_content_to_str(content) -> str:
"""Flatten Anthropic content (string or block array) to a plain string."""
if isinstance(content, str):
return content
if isinstance(content, list):
parts = []
for block in content:
if not isinstance(block, dict):
continue
if block.get("type") == "text":
parts.append(block.get("text", ""))
elif block.get("type") == "tool_result":
raw = block.get("content", "")
if isinstance(raw, list):
raw = " ".join(r.get("text", "") for r in raw if isinstance(r, dict) and r.get("type") == "text")
parts.append(str(raw))
return " ".join(parts)
return str(content) if content else ""
def _anthropic_messages_to_ollama(messages: list, system: str = None) -> list:
"""Transform Anthropic messages array to Ollama /api/chat format."""
result = []
if system:
result.append({"role": "system", "content": system})
for msg in messages:
role = msg.get("role")
content = msg.get("content")
if role == "assistant" and isinstance(content, list):
text = " ".join(b.get("text", "") for b in content if isinstance(b, dict) and b.get("type") == "text")
tool_calls = [
{"function": {"name": b["name"], "arguments": b.get("input", {})}}
for b in content if isinstance(b, dict) and b.get("type") == "tool_use"
]
entry = {"role": "assistant", "content": text}
if tool_calls:
entry["tool_calls"] = tool_calls
result.append(entry)
elif role == "user" and isinstance(content, list):
text_parts = []
for block in content:
if not isinstance(block, dict):
continue
if block.get("type") == "tool_result":
if text_parts:
result.append({"role": "user", "content": " ".join(text_parts)})
text_parts = []
raw = block.get("content", "")
if isinstance(raw, list):
raw = " ".join(r.get("text", "") for r in raw if isinstance(r, dict) and r.get("type") == "text")
result.append({"role": "tool", "content": str(raw)})
elif block.get("type") == "text":
text_parts.append(block.get("text", ""))
if text_parts:
result.append({"role": "user", "content": " ".join(text_parts)})
else:
result.append({"role": role, "content": _anthropic_content_to_str(content)})
return result
def _anthropic_tools_to_ollama(tools: list) -> list:
"""Transform Anthropic tools to Ollama/OpenAI function format."""
return [
{
"type": "function",
"function": {
"name": t["name"],
"description": t.get("description", ""),
"parameters": t.get("input_schema", {}),
},
}
for t in tools
]
def _ollama_to_anthropic_response(ollama_resp: dict, model_name: str, msg_id: str) -> dict:
"""Transform an Ollama /api/chat response to Anthropic Messages API format."""
msg = ollama_resp.get("message", {})
text = msg.get("content", "")
tool_calls = msg.get("tool_calls") or []
content_blocks = []
if text:
content_blocks.append({"type": "text", "text": text})
stop_reason = "end_turn"
for i, tc in enumerate(tool_calls):
stop_reason = "tool_use"
fn = tc.get("function", {})
args = fn.get("arguments", {})
if isinstance(args, str):
try:
args = json.loads(args)
except json.JSONDecodeError:
args = {}
content_blocks.append({
"type": "tool_use",
"id": f"toolu_{msg_id}_{i}",
"name": fn.get("name", ""),
"input": args,
})
return {
"id": f"msg_{msg_id}",
"type": "message",
"role": "assistant",
"content": content_blocks,
"model": model_name,
"stop_reason": stop_reason,
"stop_sequence": None,
"usage": {
"input_tokens": ollama_resp.get("prompt_eval_count", 0),
"output_tokens": ollama_resp.get("eval_count", 0),
},
}
@app.post("/v1/messages")
async def anthropic_messages(request: Request, db: Session = Depends(get_db)):
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
body = await request.json()
force_model = crud.get_setting(db, "force_model") or None
model_name = force_model or os.getenv("ANTHROPIC_DEFAULT_MODEL") or body.get("model")
if not model_name:
raise HTTPException(status_code=422, detail="Field 'model' is required")
anthropic_msgs = body.get("messages", [])
system = body.get("system")
system_str = _anthropic_content_to_str(system) if system else ""
all_text = system_str + " ".join(_anthropic_content_to_str(m.get("content")) for m in anthropic_msgs)
prompt_tokens = crud.count_tokens(all_text)
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
raise HTTPException(status_code=429, detail="Quota exceeded")
ollama_messages = _anthropic_messages_to_ollama(anthropic_msgs, system=system_str)
ollama_body: dict = {"model": model_name, "messages": ollama_messages, "stream": body.get("stream", False)}
if tools := body.get("tools"):
ollama_body["tools"] = _anthropic_tools_to_ollama(tools)
msg_id = secrets.token_hex(12)
target = f"{ollama_url}/api/chat"
usage_log.info('%s | /v1/messages | %s | ~%d tokens | "%s"',
request.state.api_key_name, model_name, prompt_tokens, _last_user_msg(ollama_messages))
start = time.monotonic()
if body.get("stream"):
# Backend wird immer non-streaming aufgerufen; der Dev-Proxy baut SSE selbst auf.
# Das ist nötig, weil vorgelagerte Proxys (z.B. Produktiv-Proxy) /api/chat
# nur non-streaming exponieren.
non_stream_body = {**ollama_body, "stream": False}
async def generate():
try:
response = await proxy_request(target, method="POST", json_data=non_stream_body)
ollama_resp = response.json()
except Exception as exc:
error_log.error("Stream error | %s | /v1/messages | %s | %s: %s",
request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
raise
msg = ollama_resp.get("message", {})
text = msg.get("content", "")
input_tokens = ollama_resp.get("prompt_eval_count", 0)
output_tokens = ollama_resp.get("eval_count", 0)
yield f"event: message_start\ndata: {json.dumps({'type': 'message_start', 'message': {'id': f'msg_{msg_id}', 'type': 'message', 'role': 'assistant', 'content': [], 'model': model_name, 'stop_reason': None, 'stop_sequence': None, 'usage': {'input_tokens': input_tokens, 'output_tokens': 0}}})}\n\n"
yield f"event: content_block_start\ndata: {json.dumps({'type': 'content_block_start', 'index': 0, 'content_block': {'type': 'text', 'text': ''}})}\n\n"
yield f"event: ping\ndata: {json.dumps({'type': 'ping'})}\n\n"
if text:
yield f"event: content_block_delta\ndata: {json.dumps({'type': 'content_block_delta', 'index': 0, 'delta': {'type': 'text_delta', 'text': text}})}\n\n"
yield f"event: content_block_stop\ndata: {json.dumps({'type': 'content_block_stop', 'index': 0})}\n\n"
yield f"event: message_delta\ndata: {json.dumps({'type': 'message_delta', 'delta': {'stop_reason': 'end_turn', 'stop_sequence': None}, 'usage': {'output_tokens': output_tokens}})}\n\n"
yield f"event: message_stop\ndata: {json.dumps({'type': 'message_stop'})}\n\n"
usage_log.info('%s | /v1/messages | %s | actual ↑%d%d tokens | %.1fs',
request.state.api_key_name, model_name,
input_tokens, output_tokens,
time.monotonic() - start)
return StreamingResponse(
generate(),
media_type="text/event-stream",
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
)
try:
response = await proxy_request(target, method="POST", json_data=ollama_body)
result = _ollama_to_anthropic_response(response.json(), model_name, msg_id)
usage_log.info('%s | /v1/messages | %s | actual ↑%d%d tokens | %.1fs',
request.state.api_key_name, model_name,
result["usage"]["input_tokens"], result["usage"]["output_tokens"],
time.monotonic() - start)
return JSONResponse(content=result, status_code=response.status_code)
except Exception as exc:
error_log.error("Proxy error | %s | /v1/messages | %s | %s: %s",
request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
raise
@app.get("/v1/models") @app.get("/v1/models")
async def list_openai_models(db: Session = Depends(get_db)): async def list_openai_models(db: Session = Depends(get_db)):
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434")) ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
@ -99,35 +396,74 @@ async def list_openai_models(db: Session = Depends(get_db)):
@app.post("/v1/chat/completions") @app.post("/v1/chat/completions")
async def openai_chat_completions(request: Request, db: Session = Depends(get_db)): async def openai_chat_completions(request: Request, db: Session = Depends(get_db)):
ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434")) ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
default_model = crud.get_setting(db, "default_model", os.getenv("DEFAULT_MODEL", "llama3"))
body = await request.json() body = await request.json()
force_model = crud.get_setting(db, "force_model") or None
if force_model:
body = {**body, "model": force_model}
if not body.get("model"):
raise HTTPException(status_code=422, detail="Field 'model' is required")
messages = body.get("messages", []) messages = body.get("messages", [])
prompt_tokens = sum(crud.count_tokens(msg.get("content") or "") for msg in messages) prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages)
if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1): if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
raise HTTPException(status_code=429, detail="Quota exceeded") raise HTTPException(status_code=429, detail="Quota exceeded")
if "model" not in body:
body = {**body, "model": default_model}
model_name = body["model"] model_name = body["model"]
usage_log.info('%s | /v1/chat/completions | %s | ~%d tokens | "%s"', usage_log.info('%s | /v1/chat/completions | %s | ~%d tokens | "%s"',
request.state.api_key_name, model_name, prompt_tokens, _last_user_msg(messages)) request.state.api_key_name, model_name, prompt_tokens, _last_user_msg(messages))
target = f"{ollama_url}/v1/chat/completions" target = f"{ollama_url}/v1/chat/completions"
if body.get("stream"): if body.get("stream"):
existing_opts = body.get("stream_options") or {}
stream_body = {**body, "stream_options": {**existing_opts, "include_usage": True}}
start = time.monotonic()
usage_tokens = {"prompt": 0, "completion": 0}
async def generate(): async def generate():
try:
async with httpx.AsyncClient(timeout=300.0) as client: async with httpx.AsyncClient(timeout=300.0) as client:
async with client.stream("POST", target, json=body) as resp: async with client.stream("POST", target, json=stream_body, headers=_backend_headers()) as resp:
async for chunk in resp.aiter_bytes(): async for chunk in resp.aiter_bytes():
try:
for line in chunk.decode("utf-8", errors="ignore").splitlines():
if line.startswith("data: ") and "[DONE]" not in line:
data = json.loads(line[6:])
if u := data.get("usage"):
usage_tokens["prompt"] = u.get("prompt_tokens", 0)
usage_tokens["completion"] = u.get("completion_tokens", 0)
except Exception:
pass
yield chunk yield chunk
except Exception as exc:
error_log.error("Stream error | %s | /v1/chat/completions | %s | %s: %s",
request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
raise
finally:
usage_log.info('%s | /v1/chat/completions | %s | actual ↑%d%d tokens | %.1fs',
request.state.api_key_name, model_name,
usage_tokens["prompt"], usage_tokens["completion"],
time.monotonic() - start)
return StreamingResponse( return StreamingResponse(
generate(), generate(),
media_type="text/event-stream", media_type="text/event-stream",
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"}, headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
) )
start = time.monotonic()
try:
response = await proxy_request(target, method="POST", json_data=body) response = await proxy_request(target, method="POST", json_data=body)
return JSONResponse(content=response.json(), status_code=response.status_code) resp_json = response.json()
usage = resp_json.get("usage", {})
usage_log.info('%s | /v1/chat/completions | %s | actual ↑%d%d tokens | %.1fs',
request.state.api_key_name, model_name,
usage.get("prompt_tokens", 0), usage.get("completion_tokens", 0),
time.monotonic() - start)
return JSONResponse(content=resp_json, status_code=response.status_code)
except Exception as exc:
error_log.error("Proxy error | %s | /v1/chat/completions | %s | %s: %s",
request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
raise

View File

@ -40,7 +40,7 @@ class QuotaUpdate(BaseModel):
class Settings(BaseModel): class Settings(BaseModel):
ollama_url: str ollama_url: str
default_model: str force_model: Optional[str] = None
class UsageStats(BaseModel): class UsageStats(BaseModel):
tokens_used_today: int = 0 tokens_used_today: int = 0

View File

@ -0,0 +1,59 @@
import os
import pytest
from fastapi.testclient import TestClient
os.environ.setdefault("ADMIN_PASSWORD", "test-admin-pw")
os.environ.setdefault("OLLAMA_URL", "http://127.0.0.1:9999")
@pytest.fixture
def client(tmp_path):
log_file = tmp_path / "usage.log"
log_file.write_text("\n".join(f"Zeile {i}" for i in range(1, 16)) + "\n")
(tmp_path / "error.log").write_text("Fehler A\nFehler B\n")
os.environ["LOG_FILE"] = str(log_file)
from database import Base, engine
Base.metadata.drop_all(bind=engine)
Base.metadata.create_all(bind=engine)
from admin import app
yield TestClient(app, raise_server_exceptions=False)
Base.metadata.drop_all(bind=engine)
os.environ.pop("LOG_FILE", None)
AUTH = {"Authorization": "Bearer test-admin-pw"}
def test_logs_usage_returns_last_10_lines(client):
resp = client.get("/api/logs/usage", headers=AUTH)
assert resp.status_code == 200
lines = resp.json()["lines"]
assert len(lines) == 10
assert lines[-1] == "Zeile 15"
assert lines[0] == "Zeile 6"
def test_logs_error_returns_content(client):
resp = client.get("/api/logs/error", headers=AUTH)
assert resp.status_code == 200
assert resp.json()["lines"] == ["Fehler A", "Fehler B"]
def test_logs_missing_file_returns_empty(client, tmp_path):
os.environ["LOG_FILE"] = str(tmp_path / "nonexistent.log")
resp = client.get("/api/logs/usage", headers=AUTH)
assert resp.status_code == 200
assert resp.json()["lines"] == []
def test_logs_invalid_name_returns_400(client):
resp = client.get("/api/logs/secret", headers=AUTH)
assert resp.status_code == 400
def test_logs_requires_auth(client):
resp = client.get("/api/logs/usage")
assert resp.status_code == 401

View File

@ -0,0 +1,272 @@
import json
import os
from unittest.mock import AsyncMock, MagicMock, patch, call
def _make_body(model="llama3", messages=None, stream=False, **kwargs):
body = {
"model": model,
"messages": messages or [{"role": "user", "content": "Hello"}],
"max_tokens": 100,
}
if stream:
body["stream"] = True
body.update(kwargs)
return body
def _ollama_chat_response(content="Hi!", input_tokens=5, output_tokens=3):
return {
"model": "llama3",
"message": {"role": "assistant", "content": content},
"prompt_eval_count": input_tokens,
"eval_count": output_tokens,
"done": True,
}
# --- Auth ---
def test_messages_missing_auth_returns_401(test_client):
response = test_client.post("/v1/messages", json=_make_body())
assert response.status_code == 401
def test_messages_invalid_key_returns_401(test_client):
response = test_client.post(
"/v1/messages",
headers={"x-api-key": "sk-invalid"},
json=_make_body(),
)
assert response.status_code == 401
@patch("main.proxy_request", new_callable=AsyncMock)
def test_messages_accepts_anthropic_auth_token_header(mock_proxy, test_client):
mock_proxy.return_value.status_code = 200
mock_proxy.return_value.json = lambda: _ollama_chat_response()
response = test_client.post(
"/v1/messages",
headers={"anthropic-auth-token": os.environ.get("TEST_API_KEY", "")},
json=_make_body(),
)
assert response.status_code == 200
@patch("main.proxy_request", new_callable=AsyncMock)
def test_messages_accepts_x_api_key_header(mock_proxy, test_client):
mock_proxy.return_value.status_code = 200
mock_proxy.return_value.json = lambda: _ollama_chat_response()
response = test_client.post(
"/v1/messages",
headers={"x-api-key": os.environ.get("TEST_API_KEY", "")},
json=_make_body(),
)
assert response.status_code == 200
# --- Validation ---
def test_messages_missing_model_returns_422(test_client):
env = {k: v for k, v in os.environ.items() if k != "ANTHROPIC_DEFAULT_MODEL"}
with patch.dict(os.environ, env, clear=True):
response = test_client.post(
"/v1/messages",
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
json={"messages": [{"role": "user", "content": "Hi"}], "max_tokens": 100},
)
assert response.status_code == 422
@patch("main.proxy_request", new_callable=AsyncMock)
def test_messages_anthropic_default_model_used_when_no_model_in_request(mock_proxy, test_client):
mock_proxy.return_value.status_code = 200
mock_proxy.return_value.json = lambda: _ollama_chat_response()
with patch.dict(os.environ, {"ANTHROPIC_DEFAULT_MODEL": "qwen3-coder:q8_0"}):
test_client.post(
"/v1/messages",
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
json={"messages": [{"role": "user", "content": "Hi"}], "max_tokens": 100},
)
sent_body = mock_proxy.call_args[1]["json_data"]
assert sent_body["model"] == "qwen3-coder:q8_0"
# --- Quota ---
def test_messages_quota_exceeded_returns_429(test_client):
with patch("main.crud.check_and_increment_quota", return_value=False):
response = test_client.post(
"/v1/messages",
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
json=_make_body(),
)
assert response.status_code == 429
# --- Response format ---
@patch("main.proxy_request", new_callable=AsyncMock)
def test_messages_returns_anthropic_format(mock_proxy, test_client):
mock_proxy.return_value.status_code = 200
mock_proxy.return_value.json = lambda: _ollama_chat_response("Hello!")
response = test_client.post(
"/v1/messages",
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
json=_make_body(),
)
assert response.status_code == 200
data = response.json()
assert data["type"] == "message"
assert data["role"] == "assistant"
assert isinstance(data["content"], list)
assert data["content"][0]["type"] == "text"
assert data["content"][0]["text"] == "Hello!"
assert data["usage"]["input_tokens"] == 5
assert data["usage"]["output_tokens"] == 3
# --- Request transformation ---
@patch("main.proxy_request", new_callable=AsyncMock)
def test_messages_system_prompt_becomes_first_system_message(mock_proxy, test_client):
mock_proxy.return_value.status_code = 200
mock_proxy.return_value.json = lambda: _ollama_chat_response()
test_client.post(
"/v1/messages",
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
json=_make_body(system="You are helpful"),
)
sent_body = mock_proxy.call_args[1]["json_data"]
assert sent_body["messages"][0]["role"] == "system"
assert sent_body["messages"][0]["content"] == "You are helpful"
@patch("main.proxy_request", new_callable=AsyncMock)
def test_messages_tools_transformed_to_ollama_function_format(mock_proxy, test_client):
mock_proxy.return_value.status_code = 200
mock_proxy.return_value.json = lambda: _ollama_chat_response()
test_client.post(
"/v1/messages",
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
json=_make_body(tools=[{
"name": "bash",
"description": "Run bash",
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}},
}]),
)
sent_body = mock_proxy.call_args[1]["json_data"]
assert sent_body["tools"][0]["type"] == "function"
assert sent_body["tools"][0]["function"]["name"] == "bash"
assert "parameters" in sent_body["tools"][0]["function"]
@patch("main.proxy_request", new_callable=AsyncMock)
def test_messages_tool_call_response_transformed_to_anthropic(mock_proxy, test_client):
mock_proxy.return_value.status_code = 200
mock_proxy.return_value.json = lambda: {
"model": "llama3",
"message": {
"role": "assistant",
"content": "",
"tool_calls": [{"function": {"name": "bash", "arguments": {"command": "ls"}}}],
},
"prompt_eval_count": 10,
"eval_count": 5,
"done": True,
}
response = test_client.post(
"/v1/messages",
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
json=_make_body(),
)
data = response.json()
assert data["stop_reason"] == "tool_use"
tool_block = next(b for b in data["content"] if b["type"] == "tool_use")
assert tool_block["name"] == "bash"
assert tool_block["input"] == {"command": "ls"}
# --- Streaming ---
@patch("main.proxy_request", new_callable=AsyncMock)
def test_messages_streaming_returns_anthropic_sse_events(mock_proxy, test_client):
mock_proxy.return_value.status_code = 200
mock_proxy.return_value.json = lambda: {
"model": "llama3",
"message": {"role": "assistant", "content": "Hi!"},
"prompt_eval_count": 5,
"eval_count": 3,
"done": True,
}
response = test_client.post(
"/v1/messages",
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
json=_make_body(stream=True),
)
assert response.status_code == 200
events = [
json.loads(line[6:])
for line in response.text.splitlines()
if line.startswith("data: ")
]
event_types = [e["type"] for e in events]
assert "message_start" in event_types
assert "content_block_start" in event_types
assert "content_block_delta" in event_types
assert "message_stop" in event_types
deltas = [e for e in events if e["type"] == "content_block_delta"]
text = "".join(d["delta"]["text"] for d in deltas)
assert text == "Hi!"
# --- Backend-Auth (BACKEND_API_KEY) ---
def test_proxy_request_forwards_backend_api_key(test_client):
with patch("main.httpx.AsyncClient") as mock_cls:
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {"result": "ok"}
mock_instance = AsyncMock()
mock_instance.__aenter__ = AsyncMock(return_value=mock_instance)
mock_instance.__aexit__ = AsyncMock(return_value=False)
mock_instance.request = AsyncMock(return_value=mock_response)
mock_cls.return_value = mock_instance
with patch.dict(os.environ, {"BACKEND_API_KEY": "sk-backend-secret"}):
test_client.post(
"/api/generate",
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
json={"model": "llama3", "prompt": "hi"},
)
_, kwargs = mock_instance.request.call_args
assert kwargs.get("headers", {}).get("Authorization") == "Bearer sk-backend-secret"
def test_proxy_request_omits_auth_header_when_no_backend_key(test_client):
with patch("main.httpx.AsyncClient") as mock_cls:
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {"result": "ok"}
mock_instance = AsyncMock()
mock_instance.__aenter__ = AsyncMock(return_value=mock_instance)
mock_instance.__aexit__ = AsyncMock(return_value=False)
mock_instance.request = AsyncMock(return_value=mock_response)
mock_cls.return_value = mock_instance
env_without_key = {k: v for k, v in os.environ.items() if k != "BACKEND_API_KEY"}
with patch.dict(os.environ, env_without_key, clear=True):
test_client.post(
"/api/generate",
headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
json={"model": "llama3", "prompt": "hi"},
)
_, kwargs = mock_instance.request.call_args
assert "Authorization" not in kwargs.get("headers", {})

View File

@ -5,17 +5,24 @@ cd "$(dirname "$0")"
IMAGE=mediaeng/llmproxy IMAGE=mediaeng/llmproxy
PLATFORM=linux/arm64 PLATFORM=linux/arm64
CURRENT=$(git describe --tags --always) LAST_TAG=$(git describe --tags --abbrev=0 2>/dev/null || true)
if [ -z "$CURRENT" ]; then HEAD_TAG=$(git tag --points-at HEAD | head -1)
echo "Fehler: git describe liefert kein Ergebnis"
if [ -n "$HEAD_TAG" ]; then
echo "HEAD bereits getaggt: $HEAD_TAG"
read -rp "Neuer Tag [${HEAD_TAG}]: " INPUT
VERSION="${INPUT:-$HEAD_TAG}"
else
echo "Letzter Tag: ${LAST_TAG:-kein Tag}"
read -rp "Neuer Tag: " INPUT
if [ -z "$INPUT" ]; then
echo "Kein Tag angegeben, breche ab."
exit 1 exit 1
fi fi
VERSION="$INPUT"
fi
echo "Aktueller Tag: $CURRENT" if [ "$VERSION" != "$HEAD_TAG" ]; then
read -rp "Neuer Tag [${CURRENT}]: " INPUT
VERSION="${INPUT:-$CURRENT}"
if [ "$VERSION" != "$CURRENT" ]; then
git tag "$VERSION" git tag "$VERSION"
git push origin "$VERSION" git push origin "$VERSION"
echo "Tag '$VERSION' gesetzt und gepusht." echo "Tag '$VERSION' gesetzt und gepusht."
@ -30,6 +37,7 @@ echo ""
docker buildx build \ docker buildx build \
--platform "$PLATFORM" \ --platform "$PLATFORM" \
--push \ --push \
--build-arg APP_VERSION="$VERSION" \
-t "$IMAGE:$VERSION" \ -t "$IMAGE:$VERSION" \
-t "$IMAGE:latest" \ -t "$IMAGE:latest" \
. .

View File

@ -1,11 +1,11 @@
services: services:
llmproxy: llmproxy:
image: mediaeng/llmproxy:latest image: mediaeng/llmproxy:latest
container_name: llmproxy
restart: unless-stopped restart: unless-stopped
network_mode: host
env_file: .env env_file: .env
ports:
- "${PROXY_PORT:-8000}:${PROXY_PORT:-8000}"
- "127.0.0.1:8001:8001"
volumes: volumes:
- ./backend/test.db:/app/backend/test.db - ./backend/test.db:/app/backend/test.db
- ./backend/logs:/app/backend/logs - ./backend/logs:/app/backend/logs

View File

@ -10,7 +10,7 @@ uvicorn main:app \
PROXY_PID=$! PROXY_PID=$!
uvicorn admin:app \ uvicorn admin:app \
--host "0.0.0.0" \ --host "${ADMIN_HOST:-0.0.0.0}" \
--port "${ADMIN_PORT:-8001}" & --port "${ADMIN_PORT:-8001}" &
ADMIN_PID=$! ADMIN_PID=$!

View File

@ -76,13 +76,17 @@ const EMPTY_KEY_FORM = {
name: '', expires_at: '', daily_tokens: '', monthly_tokens: '', daily_requests: '', monthly_requests: '', name: '', expires_at: '', daily_tokens: '', monthly_tokens: '', daily_requests: '', monthly_requests: '',
}; };
function SettingsSection({ password }) { function SettingsSection({ password, refreshKey }) {
const [settings, setSettings] = useState(null); const [settings, setSettings] = useState(null);
const [availableModels, setAvailableModels] = useState([]); const [availableModels, setAvailableModels] = useState([]);
const [modelsLoading, setModelsLoading] = useState(false); const [modelsLoading, setModelsLoading] = useState(false);
const [ollamaReachable, setOllamaReachable] = useState(true);
const [proxyEndpoint, setProxyEndpoint] = useState(null); const [proxyEndpoint, setProxyEndpoint] = useState(null);
const [appVersion, setAppVersion] = useState(null);
const [saved, setSaved] = useState(false); const [saved, setSaved] = useState(false);
const [error, setError] = useState(null); const [error, setError] = useState(null);
const [usageLog, setUsageLog] = useState([]);
const [errorLog, setErrorLog] = useState([]);
const fetchModels = async (url, currentModel) => { const fetchModels = async (url, currentModel) => {
setModelsLoading(true); setModelsLoading(true);
@ -91,12 +95,14 @@ function SettingsSection({ password }) {
headers: authHeaders(password), headers: authHeaders(password),
params: url ? { url } : {}, params: url ? { url } : {},
}); });
const models = res.data.models; const { models, reachable } = res.data;
setOllamaReachable(reachable);
setAvailableModels(models); setAvailableModels(models);
if (models.length > 0 && !models.includes(currentModel)) { if (models.length > 0 && currentModel && !models.includes(currentModel)) {
setSettings(s => ({ ...s, default_model: models[0] })); setSettings(s => ({ ...s, force_model: models[0] }));
} }
} catch { } catch {
setOllamaReachable(false);
setAvailableModels([]); setAvailableModels([]);
} finally { } finally {
setModelsLoading(false); setModelsLoading(false);
@ -105,16 +111,25 @@ function SettingsSection({ password }) {
useEffect(() => { useEffect(() => {
const headers = authHeaders(password); const headers = authHeaders(password);
Promise.all([ Promise.allSettled([
axios.get('/api/settings', { headers }), axios.get('/api/settings', { headers }),
axios.get('/api/proxy-info', { headers }), axios.get('/api/proxy-info', { headers }),
]).then(([settingsRes, proxyRes]) => { axios.get('/api/logs/usage', { headers }),
const s = settingsRes.data; axios.get('/api/logs/error', { headers }),
]).then(([settingsRes, proxyRes, usageRes, errorRes]) => {
if (settingsRes.status === 'rejected' || proxyRes.status === 'rejected') {
setError('Einstellungen konnten nicht geladen werden.');
return;
}
const s = settingsRes.value.data;
setSettings(s); setSettings(s);
setProxyEndpoint(proxyRes.data.endpoint); setProxyEndpoint(proxyRes.value.data.endpoint);
fetchModels(s.ollama_url, s.default_model); setAppVersion(proxyRes.value.data.version);
}).catch(() => setError('Einstellungen konnten nicht geladen werden.')); if (usageRes.status === 'fulfilled') setUsageLog(usageRes.value.data.lines);
}, []); if (errorRes.status === 'fulfilled') setErrorLog(errorRes.value.data.lines);
fetchModels(s.ollama_url, s.force_model);
});
}, [refreshKey]);
const handleSave = async (e) => { const handleSave = async (e) => {
e.preventDefault(); e.preventDefault();
@ -142,42 +157,61 @@ function SettingsSection({ password }) {
<small> (Änderung erfordert Neustart)</small> <small> (Änderung erfordert Neustart)</small>
</span> </span>
</div> </div>
<div className="settings-row">
<label>Version</label>
<span className="settings-value">{appVersion ?? '…'}</span>
</div>
<div className="settings-row"> <div className="settings-row">
<label>Ollama-Endpunkt</label> <label>Ollama-Endpunkt</label>
<div className="settings-input-wrap">
<input <input
type="url" type="url"
value={settings.ollama_url} value={settings.ollama_url}
onChange={(e) => setSettings({ ...settings, ollama_url: e.target.value })} onChange={(e) => setSettings({ ...settings, ollama_url: e.target.value })}
onBlur={(e) => fetchModels(e.target.value, settings.default_model)} onBlur={(e) => fetchModels(e.target.value, settings.force_model)}
placeholder="http://localhost:11434" placeholder="http://localhost:11434"
required required
/> />
{!ollamaReachable && !modelsLoading && (
<div className="warning"> Ollama nicht erreichbar unter {settings.ollama_url}</div>
)}
</div>
</div> </div>
<div className="settings-row"> <div className="settings-row">
<label>Standard-Modell</label> <label>Aktives Modell (Lock)</label>
{modelsLoading ? ( {modelsLoading ? (
<span className="settings-value">Lade Modelle</span> <span className="settings-value">Lade Modelle</span>
) : availableModels.length > 0 ? ( ) : availableModels.length > 0 ? (
<select <select
value={settings.default_model} value={settings.force_model || ""}
onChange={(e) => setSettings({ ...settings, default_model: e.target.value })} onChange={(e) => setSettings({ ...settings, force_model: e.target.value || null })}
> >
<option value=""> kein Lock </option>
{availableModels.map(m => <option key={m} value={m}>{m}</option>)} {availableModels.map(m => <option key={m} value={m}>{m}</option>)}
</select> </select>
) : ( ) : (
<input <input
type="text" type="text"
value={settings.default_model} value={settings.force_model || ""}
onChange={(e) => setSettings({ ...settings, default_model: e.target.value })} onChange={(e) => setSettings({ ...settings, force_model: e.target.value || null })}
placeholder="llama3" placeholder="leer = kein Lock"
required
/> />
)} )}
</div> </div>
{error && <div className="error">{error}</div>} {error && <div className="error">{error}</div>}
{saved && <div className="success">Gespeichert.</div>} {saved && <div className="success">Gespeichert.</div>}
<button type="submit">Speichern</button> <button type="submit">Änderungen übernehmen</button>
</form> </form>
<div className="log-section">
<h3>Nutzungslog (letzte 10 Einträge)</h3>
<pre className="log-pre">{usageLog.length > 0 ? usageLog.join('\n') : '— keine Einträge —'}</pre>
{errorLog.length > 0 && (
<>
<h3>Fehlerlog (letzte 10 Einträge)</h3>
<pre className="log-pre log-pre-error">{errorLog.join('\n')}</pre>
</>
)}
</div>
</section> </section>
); );
} }
@ -192,21 +226,31 @@ function App() {
const [creating, setCreating] = useState(false); const [creating, setCreating] = useState(false);
const [editKey, setEditKey] = useState(null); const [editKey, setEditKey] = useState(null);
const [editForm, setEditForm] = useState({}); const [editForm, setEditForm] = useState({});
const [refreshKey, setRefreshKey] = useState(0);
useEffect(() => { const [lastUpdated, setLastUpdated] = useState(null);
if (!password) { setLoading(false); return; }
fetchApiKeys().finally(() => setLoading(false));
}, [password]);
const fetchApiKeys = async () => { const fetchApiKeys = async () => {
try { try {
const res = await axios.get('/api/api-keys', { headers: authHeaders(password) }); const res = await axios.get('/api/api-keys', { headers: authHeaders(password) });
setApiKeys(res.data); setApiKeys(res.data);
setLastUpdated(new Date());
} catch { } catch {
setError('API-Keys konnten nicht geladen werden.'); setError('API-Keys konnten nicht geladen werden.');
} }
}; };
useEffect(() => {
if (!password) { setLoading(false); return; }
fetchApiKeys().finally(() => setLoading(false));
const timer = setInterval(() => {
fetchApiKeys();
setRefreshKey(k => k + 1);
}, 5 * 60 * 1000);
return () => clearInterval(timer);
}, [password]);
const handleCreate = async (e) => { const handleCreate = async (e) => {
e.preventDefault(); e.preventDefault();
setCreating(true); setCreating(true);
@ -287,6 +331,7 @@ function App() {
const logout = () => { const logout = () => {
sessionStorage.removeItem('admin_password'); sessionStorage.removeItem('admin_password');
setLastUpdated(null);
setPassword(null); setPassword(null);
}; };
@ -298,10 +343,17 @@ function App() {
<div className="container"> <div className="container">
<div className="header"> <div className="header">
<h1>Ollama Proxy Admin</h1> <h1>Ollama Proxy Admin</h1>
<div className="header-right">
{lastUpdated && (
<span className="last-updated">
Aktualisiert: {lastUpdated.toLocaleTimeString('de-DE', { hour: '2-digit', minute: '2-digit' })}
</span>
)}
<button onClick={logout}>Abmelden</button> <button onClick={logout}>Abmelden</button>
</div> </div>
</div>
<SettingsSection password={password} /> <SettingsSection password={password} refreshKey={refreshKey} />
<section> <section>
<h2>Neuer API-Key</h2> <h2>Neuer API-Key</h2>

View File

@ -182,6 +182,7 @@ tr:hover {
.settings-row label { .settings-row label {
width: 160px; width: 160px;
flex-shrink: 0; flex-shrink: 0;
font-size: 14px;
font-weight: 500; font-weight: 500;
color: #2c3e50; color: #2c3e50;
} }
@ -194,6 +195,31 @@ tr:hover {
font-size: 14px; font-size: 14px;
} }
.settings-input-wrap {
flex: 1;
display: flex;
flex-direction: column;
gap: 4px;
}
.settings-input-wrap input {
width: 100%;
padding: 8px 10px;
border: 1px solid #ccc;
border-radius: 4px;
font-size: 14px;
box-sizing: border-box;
}
.warning {
color: #b8520a;
background: #fff3e0;
border: 1px solid #e67e22;
border-radius: 4px;
padding: 6px 10px;
font-size: 13px;
}
.settings-form button { .settings-form button {
align-self: flex-start; align-self: flex-start;
padding: 8px 20px; padding: 8px 20px;
@ -383,7 +409,7 @@ tr:hover {
.edit-form label small { .edit-form label small {
font-weight: 400; font-weight: 400;
color: #999; color: #999;
font-size: 11px; font-size: 12px;
} }
.edit-form input { .edit-form input {
@ -427,3 +453,46 @@ tr:hover {
.btn-cancel:hover { .btn-cancel:hover {
background: #7f8c8d; background: #7f8c8d;
} }
.log-section {
margin-top: 24px;
border-top: 1px solid #eee;
padding-top: 16px;
}
.log-section h3 {
font-size: 14px;
font-weight: 600;
color: #34495e;
margin: 0 0 6px;
}
.log-pre {
background: #1e2a35;
color: #c8d6df;
font-family: 'Menlo', 'Consolas', monospace;
font-size: 11px;
line-height: 1.6;
padding: 10px 14px;
border-radius: 4px;
margin: 0 0 14px;
overflow-x: auto;
white-space: pre;
}
.log-pre-error {
background: #2d1b1b;
color: #f5a0a0;
margin-bottom: 0;
}
.header-right {
display: flex;
align-items: center;
gap: 16px;
}
.last-updated {
font-size: 12px;
color: #95a5a6;
}

View File

@ -11,6 +11,7 @@ export default defineConfig({
'/api/settings': 'http://localhost:8001', '/api/settings': 'http://localhost:8001',
'/api/ollama-models': 'http://localhost:8001', '/api/ollama-models': 'http://localhost:8001',
'/api/proxy-info': 'http://localhost:8001', '/api/proxy-info': 'http://localhost:8001',
'/api/logs': 'http://localhost:8001',
'/api': 'http://localhost:8000', '/api': 'http://localhost:8000',
}, },
}, },

View File

@ -40,13 +40,14 @@ def main():
proxy_host = os.environ.get('PROXY_HOST', '0.0.0.0') proxy_host = os.environ.get('PROXY_HOST', '0.0.0.0')
proxy_port = os.environ.get('PROXY_PORT', '8000') proxy_port = os.environ.get('PROXY_PORT', '8000')
admin_host = os.environ.get('ADMIN_HOST', '127.0.0.1')
admin_port = os.environ.get('ADMIN_PORT', '8001') admin_port = os.environ.get('ADMIN_PORT', '8001')
print('Initialisiere Datenbank...') print('Initialisiere Datenbank...')
subprocess.run([str(python), 'init_db.py'], cwd=backend, check=True) subprocess.run([str(python), 'init_db.py'], cwd=backend, check=True)
print(f'Starte Proxy → http://{proxy_host}:{proxy_port}') print(f'Starte Proxy → http://{proxy_host}:{proxy_port}')
print(f'Starte Admin-API → http://127.0.0.1:{admin_port}') print(f'Starte Admin-API → http://{admin_host}:{admin_port}')
print('Starte Frontend → http://localhost:5173') print('Starte Frontend → http://localhost:5173')
env = {**os.environ, 'PYTHONUNBUFFERED': '1'} env = {**os.environ, 'PYTHONUNBUFFERED': '1'}
@ -59,7 +60,7 @@ def main():
), 'Proxy ', '34'), # blau ), 'Proxy ', '34'), # blau
(subprocess.Popen( (subprocess.Popen(
[str(python), '-m', 'uvicorn', 'admin:app', '--reload', [str(python), '-m', 'uvicorn', 'admin:app', '--reload',
'--host', '127.0.0.1', '--port', admin_port], '--host', admin_host, '--port', admin_port],
cwd=backend, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, env=env, cwd=backend, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, env=env,
), 'Admin ', '33'), # gelb ), 'Admin ', '33'), # gelb
(subprocess.Popen( (subprocess.Popen(

View File

@ -1,8 +0,0 @@
#!/usr/bin/env python3
"""Pytest runner for Ollama Proxy tests."""
import subprocess
import sys
if __name__ == "__main__":
result = subprocess.run([sys.executable, "-m", "pytest"] + sys.argv[1:], cwd="backend")
sys.exit(result.returncode)

View File

@ -1,17 +1,19 @@
#!/bin/bash #!/bin/bash
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
# .env laden # .env laden
if [ -f .env ]; then if [ -f "$SCRIPT_DIR/.env" ]; then
set -a set -a
source .env source "$SCRIPT_DIR/.env"
set +a set +a
fi fi
# Virtuelle Umgebung aktivieren falls vorhanden # Virtuelle Umgebung aktivieren falls vorhanden
if [ -f .venv/bin/activate ]; then if [ -f "$SCRIPT_DIR/.venv/bin/activate" ]; then
source .venv/bin/activate source "$SCRIPT_DIR/.venv/bin/activate"
elif [ -f venv/bin/activate ]; then elif [ -f "$SCRIPT_DIR/venv/bin/activate" ]; then
source venv/bin/activate source "$SCRIPT_DIR/venv/bin/activate"
fi fi
if [ -z "$ADMIN_PASSWORD" ]; then if [ -z "$ADMIN_PASSWORD" ]; then
@ -21,6 +23,7 @@ fi
PROXY_HOST=${PROXY_HOST:-0.0.0.0} PROXY_HOST=${PROXY_HOST:-0.0.0.0}
PROXY_PORT=${PROXY_PORT:-8000} PROXY_PORT=${PROXY_PORT:-8000}
ADMIN_HOST=${ADMIN_HOST:-127.0.0.1}
ADMIN_PORT=${ADMIN_PORT:-8001} ADMIN_PORT=${ADMIN_PORT:-8001}
FRONTEND_PORT=5173 FRONTEND_PORT=5173
@ -60,9 +63,8 @@ cd backend
python3 -m uvicorn main:app --reload --host "$PROXY_HOST" --port "$PROXY_PORT" & python3 -m uvicorn main:app --reload --host "$PROXY_HOST" --port "$PROXY_PORT" &
PIDS+=($!) PIDS+=($!)
# Admin-API immer nur lokal erreichbar (Host nicht konfigurierbar) echo "Starte Admin-API auf ${ADMIN_HOST}:${ADMIN_PORT}..."
echo "Starte Admin-API auf 127.0.0.1:${ADMIN_PORT}..." python3 -m uvicorn admin:app --reload --host "$ADMIN_HOST" --port "$ADMIN_PORT" &
python3 -m uvicorn admin:app --reload --host 127.0.0.1 --port "$ADMIN_PORT" &
PIDS+=($!) PIDS+=($!)
cd .. cd ..
@ -75,7 +77,7 @@ PIDS+=($!)
cd .. cd ..
echo "Backend läuft (Port $PROXY_PORT)" echo "Backend läuft (Port $PROXY_PORT)"
echo "Admin-API läuft (Port $ADMIN_PORT, nur lokal)" echo "Admin-API läuft (${ADMIN_HOST}:${ADMIN_PORT})"
echo "Admin-Oberfläche: http://localhost:$FRONTEND_PORT" echo "Admin-Oberfläche: http://localhost:$FRONTEND_PORT"
wait wait

33
start_claude.sh Executable file
View File

@ -0,0 +1,33 @@
#!/bin/bash
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
# .env laden
if [ -f "$SCRIPT_DIR/.env" ]; then
set -a
source "$SCRIPT_DIR/.env"
set +a
fi
# API-Key: erstes Argument hat Vorrang, sonst Umgebungsvariable PROXY_API_KEY
API_KEY="${1:-$PROXY_API_KEY}"
if [ -z "$API_KEY" ]; then
echo "Fehler: Kein API-Key angegeben."
echo "Verwendung: ./start_claude.sh sk-dein-key"
echo " oder: PROXY_API_KEY=sk-dein-key ./start_claude.sh"
exit 1
fi
# 0.0.0.0 ist eine Bind-Adresse, kein gültiger Client-Host
PROXY_HOST="${PROXY_HOST:-0.0.0.0}"
PROXY_PORT="${PROXY_PORT:-8000}"
if [ "$PROXY_HOST" = "0.0.0.0" ]; then
PROXY_HOST="localhost"
fi
export ANTHROPIC_BASE_URL="http://${PROXY_HOST}:${PROXY_PORT}"
export ANTHROPIC_AUTH_TOKEN="$API_KEY"
echo "Verbinde mit Proxy: $ANTHROPIC_BASE_URL"
exec claude

View File

@ -1,7 +0,0 @@
curl -X POST http://localhost:8000/api/generate \
-H "Authorization: sk-admin-key" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3",
"prompt": "Test"
}'