Go to file

Oliver Hofmann 34b108f4df Replace default_model with force_model (model lock)

Removes DEFAULT_MODEL in favour of a force_model setting configurable
via the admin UI. When set, every proxy request's model field is
overridden, preventing uncoordinated model switches during lab sessions.
Updates schemas, admin API, all three proxy endpoints, frontend,
init_db, and docs (README, DOCKERHUB, KURZANLEITUNG).

2026-05-08 08:02:16 +02:00

.idea/runConfigurations

Replace build_push.py with build_push.sh, use shell script run config

2026-04-29 08:23:07 +02:00

backend

Replace default_model with force_model (model lock)

2026-05-08 08:02:16 +02:00

frontend

Replace default_model with force_model (model lock)

2026-05-08 08:02:16 +02:00

.dockerignore

Exclude .claude from Docker build context

2026-04-29 09:35:17 +02:00

.env.example

Refactor to flat APIKey model with quota, admin UI, .env config, and Berlin timezone

2026-04-28 08:21:42 +02:00

.gitignore

Add Docker build/push run config for arm64

2026-04-29 08:20:15 +02:00

build_push.sh

Fix build_push.sh tag detection

2026-05-07 16:11:50 +02:00

docker-compose.yml

Add container_name, remove ineffective extra_hosts with network_mode host

2026-05-07 15:09:04 +02:00

docker-entrypoint.sh

Add ADMIN_HOST env var, restructure docs

2026-05-07 16:03:03 +02:00

Dockerfile

Install psycopg2-binary in Dockerfile only, not in dev requirements

2026-04-29 08:30:16 +02:00

DOCKERHUB.en.md

Replace default_model with force_model (model lock)

2026-05-08 08:02:16 +02:00

DOCKERHUB.md

Replace default_model with force_model (model lock)

2026-05-08 08:02:16 +02:00

KURZANLEITUNG.md

Replace default_model with force_model (model lock)

2026-05-08 08:02:16 +02:00

README.md

Replace default_model with force_model (model lock)

2026-05-08 08:02:16 +02:00

run_dev.py

Make ADMIN_HOST consistent across dev and prod

2026-05-07 16:08:05 +02:00

run_tests.py

Init

2026-04-27 18:54:27 +02:00

start.sh

Make ADMIN_HOST consistent across dev and prod

2026-05-07 16:08:05 +02:00

test_api.sh

Init

2026-04-27 18:54:27 +02:00

README.md

Ollama Proxy mit API-Keys und Quotas

Ollama bietet von sich aus keine Authentifizierung — wer die API erreicht, kann sie nutzen. Dieses Projekt löst das Problem: Ollama bleibt an localhost gebunden und ist von außen nicht erreichbar. Vorgeschaltet läuft ein Proxy (Port 8000), der jeden Request auf einen gültigen API-Key prüft und optional Token- sowie Request-Quoten pro Key durchsetzt. Eine Web-Admin-Oberfläche (Port 8001) erlaubt das Verwalten von Keys, Quoten und Ollama-Einstellungen.

Features

API-Key-Authentifizierung (Bearer Token oder sk--Prefix)
Optionales Ablaufdatum pro API-Key
Quota-Management mit getrennten Tages- und Monatslimits (Tokens & Requests)
Token-Zählung via tiktoken, Reset-Grenzen in der Zeitzone Europe/Berlin
Web-Admin-Oberfläche (API-Keys verwalten, Ollama-Einstellungen, Verbrauchsanzeige)
OpenAI-kompatibler /v1/chat/completions-Endpunkt mit Streaming und Tool-Use
Rotierende Nutzungs-Logs
SQLite (Standard) oder PostgreSQL
Docker-Image auf DockerHub: mediaeng/llmproxy

Sicherheit

Admin-Oberfläche passwortgeschützt (ADMIN_PASSWORD) — alle API-Endpunkte erfordern den Token
API-Keys als SHA-256-Hash in der DB — Plaintext nur einmalig bei Erstellung
Quota-Check atomar mit SELECT FOR UPDATE (kein TOCTOU-Race)
Admin-Port 8001 über ADMIN_HOST=127.0.0.1 auf lokalen Zugriff beschränkbar

Konfiguration

.env-Datei im Projektverzeichnis anlegen:

ADMIN_PASSWORD=change-me
PROXY_HOST=0.0.0.0
PROXY_PORT=8000
ADMIN_HOST=0.0.0.0
ADMIN_PORT=8001
DATABASE_URL=sqlite:///./test.db
OLLAMA_URL=http://localhost:11434
APP_TZ=Europe/Berlin
LOG_FILE=logs/usage.log

Variable	Standard	Beschreibung
`ADMIN_PASSWORD`	—	Passwort für die Admin-Oberfläche (Pflicht)
`PROXY_HOST`	`0.0.0.0`	Bind-Adresse des Proxys
`PROXY_PORT`	`8000`	Port des Proxys
`ADMIN_HOST`	`0.0.0.0`	Bind-Adresse der Admin-API (z. B. `127.0.0.1` für lokalen Zugriff)
`ADMIN_PORT`	`8001`	Port der Admin-API
`DATABASE_URL`	`sqlite:///./test.db`	DB-Verbindungsstring (SQLite oder PostgreSQL)
`OLLAMA_URL`	`http://localhost:11434`	Adresse der Ollama-Instanz (auch in der UI änderbar)
`APP_TZ`	`Europe/Berlin`	Zeitzone für tägliche/monatliche Quota-Resets
`LOG_FILE`	`logs/usage.log`	Pfad der rotierenden Nutzungs-Logdatei
`ALLOWED_ORIGINS`	`http://localhost:5173`	CORS-Origins (nur für Entwicklung relevant)

Entwicklung (lokal)

Voraussetzungen

Python 3.12+ mit virtualenv
Node.js 18+

python -m venv .venv
source .venv/bin/activate
pip install -r backend/requirements-dev.txt
cd frontend && npm install

Starten

Per Script:

cp .env.example .env   # ADMIN_PASSWORD setzen
./start.sh

Per PyCharm: Run-Config „Dev" starten (startet Proxy, Admin-API und Vite-Dev-Server gemeinsam).

Das Script prüft alle Ports auf Belegung, initialisiert die Datenbank und startet alle drei Dienste.

Admin-Oberfläche: http://localhost:5173

Produktion (Docker)

Docker Compose (empfohlen)

docker compose up -d

Zieht das Image von DockerHub und lädt Variablen aus .env.

Das Setup verwendet network_mode: host: Der Container teilt den Netzwerkstack des Hosts, statt ein eigenes virtuelles Netzwerk zu bekommen. Das ist hier aus zwei Gründen die richtige Wahl:

Ollama soll nicht von außen erreichbar sein. Ollama läuft auf dem Host und ist an 127.0.0.1:11434 gebunden — nur lokal erreichbar. Mit einem eigenen Container-Netzwerk (Bridge-Mode) wäre localhost aus Sicht des Containers der Container selbst, nicht der Host. Die übliche Alternative (host.docker.internal + extra_hosts) ist auf Linux unzuverlässig.
Kein doppeltes Port-Mapping nötig. Mit network_mode: host sind Port 8000 und 8001 direkt auf dem Host verfügbar, ohne ports:-Einträge in der Compose-Datei.

Image selbst bauen und pushen

./build_push.sh

Das Script zeigt den aktuellen Git-Tag, bietet an einen neuen zu setzen, baut das Image für linux/arm64 und pusht zu mediaeng/llmproxy.

Port 8001 (Admin)

Alle Admin-Endpunkte erfordern das ADMIN_PASSWORD — der Token ist der primäre Schutz. Für zusätzliche Härtung lässt sich die Admin-API auf lokalen Zugriff beschränken:

ADMIN_HOST=127.0.0.1

Bei network_mode: host (Produktions-Standard) ist das die einzig wirksame Methode — Docker-Port-Mapping greift dort nicht.

HTTPS via Reverse-Proxy (ungetestet)

Wer Proxy und Admin-Oberfläche per HTTPS bereitstellen will, kann einen weiteren Reverse-Proxy (z. B. Nginx oder Caddy) vorschalten. Bei network_mode: host lauschen beide Dienste direkt auf dem Host, Nginx/Caddy proxyen auf localhost.

Caddy (empfohlen — automatisches TLS via Let's Encrypt):

llm.example.com {
    reverse_proxy localhost:8000 {
        flush_interval -1
    }
}

llm-admin.example.com {
    reverse_proxy localhost:8001
}

Nginx (mit Certbot-Zertifikaten):

server {
    listen 443 ssl;
    server_name llm.example.com;

    ssl_certificate     /etc/letsencrypt/live/llm.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/llm.example.com/privkey.pem;

    location / {
        proxy_pass         http://127.0.0.1:8000;
        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_buffering    off;   # nötig für Streaming
        proxy_cache        off;
    }
}

server {
    listen 443 ssl;
    server_name llm-admin.example.com;

    ssl_certificate     /etc/letsencrypt/live/llm-admin.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/llm-admin.example.com/privkey.pem;

    location / {
        proxy_pass       http://127.0.0.1:8001;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Clients konfigurieren dann https://llm.example.com/v1 als Base URL.

Proxy-Endpunkte (Port 8000)

Alle Endpunkte erfordern einen gültigen API-Key im Authorization-Header.

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer sk-xxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3","messages":[{"role":"user","content":"Hallo"}]}'

Endpunkt	Methode	Beschreibung
`/v1/chat/completions`	POST	Chat (OpenAI-Format, Streaming + Tool-Use)
`/v1/models`	GET	Modelle (OpenAI-Format)
`/api/generate`	POST	Ollama generate (nativ)
`/api/chat`	POST	Ollama chat (nativ)
`/api/tags`	GET	Verfügbare Modelle
`/api/versions`	GET	Ollama-Version

Admin-API (Port 8001)

Alle Endpunkte erfordern Authorization: Bearer <ADMIN_PASSWORD>.

Endpunkt	Methode	Beschreibung
`/api/api-keys`	GET	Alle API-Keys mit Verbrauchsdaten
`/api/api-keys`	POST	Neuen API-Key erstellen
`/api/api-keys/{id}/quota`	PATCH	Limits eines Keys aktualisieren
`/api/api-keys/{id}/activate`	PUT	API-Key aktivieren
`/api/api-keys/{id}/deactivate`	PUT	API-Key deaktivieren
`/api/api-keys/{id}`	DELETE	API-Key löschen
`/api/settings`	GET/PUT	Ollama-URL und Standard-Modell
`/api/ollama-models`	GET	Verfügbare Modelle von Ollama
`/api/proxy-info`	GET	Lokaler Proxy-Endpunkt

Tests

cd backend
python -m pytest tests/ -v

Projektstruktur

llm_quota/
├── backend/
│   ├── main.py              # Proxy-Server (Port 8000)
│   ├── admin.py             # Admin-API + Static-File-Serving (Port 8001)
│   ├── database.py          # DB-Verbindung & Session
│   ├── models.py            # SQLAlchemy-Modelle (APIKey, Setting, Usage)
│   ├── schemas.py           # Pydantic-Schemas
│   ├── crud.py              # DB-Operationen, Token-Zählung, Quota-Logik
│   ├── init_db.py           # Tabellen anlegen & Settings seeden
│   ├── requirements.txt     # Produktiv-Dependencies
│   ├── requirements-dev.txt # Test-Dependencies
│   └── tests/
│       ├── conftest.py
│       ├── test_auth.py
│       └── test_quota.py
├── frontend/
│   └── src/
│       ├── main.jsx         # React-Admin-UI
│       └── styles.css
├── .idea/runConfigurations/
│   └── Dev.xml              # PyCharm Run-Config
├── Dockerfile
├── docker-compose.yml       # Produktiv-Start mit DockerHub-Image
├── docker-entrypoint.sh
├── .dockerignore
├── start.sh                 # Entwicklungs-Startscript
├── run_dev.py               # Entwicklungs-Runner für PyCharm
├── build_push.sh            # Docker-Build & Push zu DockerHub
├── DOCKERHUB.md             # DockerHub-Beschreibung (deutsch)
├── DOCKERHUB.en.md          # DockerHub-Beschreibung (englisch)
└── .gitignore

Lizenz

MIT

Languages

Python 65.8%

JavaScript 21.2%

CSS 7.6%

Shell 4.3%

Dockerfile 0.8%

Other 0.3%