llmproxy/README.md

# Ollama Proxy mit API-Keys und Quotas

Ein Reverse-Proxy für Ollama mit API-Key-Authentifizierung, Quota-Management und Web-Admin-Oberfläche.

## Features

- API-Key-Authentifizierung (Bearer Token oder `sk-`-Prefix)
- Optionales Ablaufdatum pro API-Key
- Quota-Management mit getrennten Tages- und Monatslimits (Tokens & Requests)
- Token-Zählung via tiktoken, Reset-Grenzen in der Zeitzone Europe/Berlin
- Web-Admin-Oberfläche (API-Keys verwalten, Ollama-Einstellungen, Proxy-Info)
- OpenAI-kompatibler `/v1/chat/completions`-Endpunkt

## Sicherheit

- Admin-Oberfläche passwortgeschützt (`ADMIN_PASSWORD`)
- Admin-API bindet lokal auf `127.0.0.1` (nicht von außen erreichbar)
- API-Keys als SHA-256-Hash in der DB — Plaintext nur einmalig bei Erstellung
- Quota-Check atomar mit `SELECT FOR UPDATE` (kein TOCTOU-Race)
- CORS-Origins konfigurierbar via `ALLOWED_ORIGINS`

## Konfiguration

`.env`-Datei im Projektverzeichnis anlegen (Vorlage: `.env.example`):

```env
ADMIN_PASSWORD=change-me
PROXY_HOST=0.0.0.0
PROXY_PORT=8000
ADMIN_PORT=8001
DATABASE_URL=sqlite:///./test.db
OLLAMA_URL=http://localhost:11434
DEFAULT_MODEL=llama3
APP_TZ=Europe/Berlin
```

| Variable | Standard | Beschreibung |
|----------|----------|--------------|
| `ADMIN_PASSWORD` | — | Passwort für die Admin-Oberfläche (**Pflicht**) |
| `PROXY_HOST` | `0.0.0.0` | Bind-Adresse des Proxys |
| `PROXY_PORT` | `8000` | Port des Proxys |
| `ADMIN_PORT` | `8001` | Port der Admin-API |
| `DATABASE_URL` | `sqlite:///./test.db` | DB-Verbindungsstring |
| `OLLAMA_URL` | `http://localhost:11434` | Adresse der Ollama-Instanz (auch in der UI änderbar) |
| `DEFAULT_MODEL` | `llama3` | Standard-Modell für `/v1/chat/completions` (auch in der UI änderbar) |
| `APP_TZ` | `Europe/Berlin` | Zeitzone für tägliche/monatliche Quota-Resets |
| `ALLOWED_ORIGINS` | `http://localhost:5173` | Kommagetrennte CORS-Origins |

## Entwicklung (lokal)

```bash
cp .env.example .env
# ADMIN_PASSWORD in .env setzen

./start.sh
```

Das Script prüft alle Ports auf Belegung, aktiviert automatisch eine vorhandene `.venv`, initialisiert die Datenbank und startet Proxy, Admin-API und Vite-Dev-Server.

Admin-Oberfläche: `http://localhost:5173`

### Voraussetzungen

- Python 3.12+ mit virtualenv
- Node.js 18+

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r backend/requirements-dev.txt

cd frontend && npm install
```

## Produktion (Docker)

### Image bauen

```bash
docker build -t llm-quota .
```

### Container starten

```bash
docker run -d \
  -p 8000:8000 \
  -p 8001:8001 \
  -e ADMIN_PASSWORD=geheim \
  -e OLLAMA_URL=http://host.docker.internal:11434 \
  -e DATABASE_URL=sqlite:///./data/quota.db \
  -v $(pwd)/data:/app/backend/data \
  --name llm-quota \
  llm-quota
```

| Port | Dienst |
|------|--------|
| `8000` | Proxy (für LLM-Clients) |
| `8001` | Admin-API + Admin-Oberfläche |

Admin-Oberfläche: `http://localhost:8001`

### Mit PostgreSQL

```bash
docker run -d \
  -p 8000:8000 \
  -p 8001:8001 \
  -e ADMIN_PASSWORD=geheim \
  -e DATABASE_URL=postgresql://user:pass@db-host:5432/llm_quota \
  -e OLLAMA_URL=http://ollama:11434 \
  llm-quota
```

> **Hinweis:** Im Container bindet die Admin-API auf `0.0.0.0`. Port 8001 sollte nicht öffentlich exponiert werden — entweder per Firewall absichern oder hinter einem Reverse-Proxy (nginx, Caddy) betreiben.

## Proxy-Endpunkte (Port 8000)

Alle Endpunkte erfordern einen gültigen API-Key im `Authorization`-Header.

```bash
curl -X POST http://localhost:8000/api/chat \
  -H "Authorization: Bearer sk-xxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3","messages":[{"role":"user","content":"Hallo"}]}'
```

| Endpunkt | Methode | Beschreibung |
|----------|---------|--------------|
| `/api/generate` | POST | Ollama generate |
| `/api/chat` | POST | Ollama chat |
| `/api/tags` | GET | Verfügbare Modelle |
| `/api/versions` | GET | Ollama-Version |
| `/v1/models` | GET | Modelle (OpenAI-Format) |
| `/v1/chat/completions` | POST | Chat (OpenAI-Format) |

## Admin-API (Port 8001)

Alle Endpunkte erfordern `Authorization: Bearer <ADMIN_PASSWORD>`.

| Endpunkt | Methode | Beschreibung |
|----------|---------|--------------|
| `/api/api-keys` | GET | Alle API-Keys auflisten |
| `/api/api-keys` | POST | Neuen API-Key erstellen |
| `/api/api-keys/{id}/deactivate` | PUT | API-Key deaktivieren |
| `/api/api-keys/{id}/quota` | PATCH | Quota eines Keys aktualisieren |
| `/api/settings` | GET/PUT | Ollama-URL und Standard-Modell |
| `/api/ollama-models` | GET | Verfügbare Modelle von Ollama |
| `/api/proxy-info` | GET | Lokaler Proxy-Endpunkt |

## Tests

```bash
cd backend
python -m pytest tests/ -v
```

## Projektstruktur

```
llm_quota/
├── backend/
│   ├── main.py              # Proxy-Server (Port 8000)
│   ├── admin.py             # Admin-API + Static-File-Serving (Port 8001)
│   ├── database.py          # DB-Verbindung & Session
│   ├── models.py            # SQLAlchemy-Modelle (APIKey, Setting, Usage)
│   ├── schemas.py           # Pydantic-Schemas
│   ├── crud.py              # DB-Operationen, Token-Zählung, Quota-Logik
│   ├── init_db.py           # Tabellen anlegen & Settings seeden
│   ├── setup_admin.py       # Standard-API-Key erstellen
│   ├── requirements.txt     # Produktiv-Dependencies
│   ├── requirements-dev.txt # Test-Dependencies
│   └── tests/
│       ├── conftest.py      # Fixtures
│       ├── test_auth.py     # Authentifizierungs-Tests
│       └── test_quota.py    # Quota-, Token- und Ablauf-Tests
├── frontend/
│   └── src/
│       ├── main.jsx         # React-Admin-UI
│       └── styles.css
├── Dockerfile
├── docker-entrypoint.sh
├── .dockerignore
├── .env.example
├── start.sh                 # Entwicklungs-Startscript
└── .gitignore
```

## Lizenz

MIT