Remove redundant run_tests.py wrapper

Remove obsolete test_api.sh
Remove docs/ from tracking, add to gitignore
2026-05-10 11:51:13 +02:00 · 2026-05-10 11:47:15 +02:00 · 2026-05-10 11:44:10 +02:00 · 2026-05-10 11:21:39 +02:00 · 2026-05-10 10:50:09 +02:00 · 2026-05-10 10:45:41 +02:00
24 changed files with 1139 additions and 91 deletions
--- a/.dockerignore
+++ b/.dockerignore
@ -42,6 +42,8 @@ docker-compose.yml

 # Docs
 *.md
+*.tex
+*.pdf

 # Dev & build scripts
 run_dev.py
--- a/.env.example
+++ b/.env.example
@ -16,3 +16,8 @@ DATABASE_URL=sqlite:///./test.db
 OLLAMA_URL=http://localhost:11434
 DEFAULT_MODEL=llama3
 APP_TZ=Europe/Berlin
+
+# Standard-Modell für den Anthropic-kompatiblen Endpunkt (/v1/messages)
+# Wird verwendet, wenn der Client kein Modell angibt oder ein Anthropic-Modellname
+# (z.B. claude-opus-4-7) auf kein lokales Modell passt.
+ANTHROPIC_DEFAULT_MODEL=llama3
--- a/.gitignore
+++ b/.gitignore
@ -27,3 +27,10 @@ frontend/dist/

 # Misc
 config.json
+
+# Generated documents
+KURZANLEITUNG.tex
+KURZANLEITUNG.pdf
+
+# Internal planning docs
+docs/
--- a/DOCKERHUB.en.md
+++ b/DOCKERHUB.en.md
@ -1,12 +1,14 @@
 # mediaeng/llmproxy

-A lightweight reverse proxy for [Ollama](https://ollama.com) that manages API keys with configurable token and request quotas. Incoming requests in OpenAI-compatible format are authenticated, checked against the quota, and forwarded to the configured Ollama server.
+A lightweight reverse proxy for [Ollama](https://ollama.com) that manages API keys with configurable token and request quotas. Incoming requests in OpenAI-compatible or Anthropic-compatible format are authenticated, checked against the quota, and forwarded to the configured Ollama server.

 ## Features

 - OpenAI-compatible endpoint (`/v1/chat/completions`, `/v1/models`)
+- Anthropic Messages API (`/v1/messages`) — compatible with Claude Code CLI and Anthropic SDK clients
 - API key management with daily and monthly token/request limits
 - Web-based admin interface (port 8001)
+- Model lock: enforces a specific model for all requests (useful for courses and lab sessions)
 - Streaming support (Server-Sent Events)
 - Tool use / function calling passthrough
 - Rotating usage logs
@ -16,7 +18,7 @@ A lightweight reverse proxy for [Ollama](https://ollama.com) that manages API ke

 | Port | Service |
 |------|---------|
-| `8000` | Proxy endpoint (OpenAI API) |
+| `8000` | Proxy endpoint (OpenAI and Anthropic API) |
 | `8001` | Admin API + web interface |

 All API endpoints require the `ADMIN_PASSWORD` — without a valid token, only the public frontend files (HTML/JS/CSS of the login page) are accessible. The password is therefore the primary protection.
@ -27,7 +29,6 @@ All API endpoints require the `ADMIN_PASSWORD` — without a valid token, only t
 |----------|---------|-------------|
 | `ADMIN_PASSWORD` | – | **Required.** Password for the admin interface |
 | `OLLAMA_URL` | `http://localhost:11434` | URL of the Ollama server (without `/v1` suffix) |
-| `DEFAULT_MODEL` | `llama3` | Model used when the client does not specify one |
 | `DATABASE_URL` | `sqlite:///./test.db` | Database connection string (SQLite or PostgreSQL) |
 | `PROXY_HOST` | `0.0.0.0` | Proxy bind address |
 | `PROXY_PORT` | `8000` | Proxy port |
@ -35,6 +36,7 @@ All API endpoints require the `ADMIN_PASSWORD` — without a valid token, only t
 | `ADMIN_PORT` | `8001` | Admin API port |
 | `APP_TZ` | `Europe/Berlin` | Timezone for daily/monthly quota resets |
 | `LOG_FILE` | `logs/usage.log` | Path of the rotating usage log file |
+| `ANTHROPIC_DEFAULT_MODEL` | – | Default model for `/v1/messages` (Ollama model name, e.g. `llama3`) |

 ## Docker Compose – Ollama on the Host (Linux, recommended)

@ -59,8 +61,8 @@ volumes:
 ```env
 ADMIN_PASSWORD=changeme
 OLLAMA_URL=http://localhost:11434
-DEFAULT_MODEL=llama3
 APP_TZ=Europe/Berlin
+ANTHROPIC_DEFAULT_MODEL=llama3
 ```

 ## Docker Compose – Ollama as Container, SQLite
@ -78,8 +80,8 @@ services:
    environment:
      ADMIN_PASSWORD: changeme
      OLLAMA_URL: http://ollama:11434
-      DEFAULT_MODEL: llama3
      APP_TZ: Europe/Berlin
+      ANTHROPIC_DEFAULT_MODEL: llama3
    volumes:
      - llmproxy-data:/app/backend
    depends_on:
@ -111,9 +113,9 @@ services:
    environment:
      ADMIN_PASSWORD: changeme
      OLLAMA_URL: http://ollama:11434
-      DEFAULT_MODEL: llama3
      APP_TZ: Europe/Berlin
      DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
+      ANTHROPIC_DEFAULT_MODEL: llama3
    depends_on:
      db:
        condition: service_healthy
@ -148,9 +150,23 @@ volumes:

 ## Client Configuration

-Configure the proxy as an OpenAI-compatible endpoint:
-
+**OpenAI-compatible client:**
 ```
 Base URL:  http://<host>:8000/v1
 API Key:   <API key created in the admin interface>
 ```
+
+**Claude Code CLI:**
+```bash
+ANTHROPIC_BASE_URL=http://<host>:8000 \
+ANTHROPIC_AUTH_TOKEN=<API key created in the admin interface> \
+claude
+```
+
+## Acknowledgements
+
+The Anthropic Messages API endpoint (`/v1/messages`) was inspired by [free-claude-code](https://github.com/Alishahryar1/free-claude-code) by Ali Khokhar, which pursues a similar approach for routing Claude Code requests to alternative LLM backends.
+
+## License
+
+MIT — © 2026 Oliver Hofmann. See [LICENSE](https://git.efi.th-nuernberg.de/gitea/hofmannol/llmproxy/src/branch/main/LICENSE) for details.
--- a/DOCKERHUB.md
+++ b/DOCKERHUB.md
@ -1,12 +1,14 @@
 # mediaeng/llmproxy

-Ein schlanker Reverse-Proxy für [Ollama](https://ollama.com), der API-Keys mit konfigurierbaren Token- und Request-Quoten verwaltet. Eingehende Anfragen im OpenAI-kompatiblen Format werden authentifiziert, auf Quota geprüft und an den konfigurierten Ollama-Server weitergeleitet.
+Ein schlanker Reverse-Proxy für [Ollama](https://ollama.com), der API-Keys mit konfigurierbaren Token- und Request-Quoten verwaltet. Eingehende Anfragen im OpenAI-kompatiblen oder Anthropic-kompatiblen Format werden authentifiziert, auf Quota geprüft und an den konfigurierten Ollama-Server weitergeleitet.

 ## Funktionen

 - OpenAI-kompatibler Endpunkt (`/v1/chat/completions`, `/v1/models`)
+- Anthropic Messages API (`/v1/messages`) — kompatibel mit Claude Code CLI und Anthropic-SDK-Clients
 - API-Key-Verwaltung mit tages- und monatlichen Token-/Request-Limits
 - Web-basierte Admin-Oberfläche (Port 8001)
+- Modell-Lock: erzwingt ein bestimmtes Modell für alle Requests (nützlich für Praktika/Kurse)
 - Streaming-Support (Server-Sent Events)
 - Tool-Use / Function Calling wird durchgereicht
 - Rotierende Nutzungs-Logs
@ -16,7 +18,7 @@ Ein schlanker Reverse-Proxy für [Ollama](https://ollama.com), der API-Keys mit

 | Port | Dienst |
 |------|--------|
-| `8000` | Proxy-Endpunkt (OpenAI-API) |
+| `8000` | Proxy-Endpunkt (OpenAI- und Anthropic-API) |
 | `8001` | Admin-API + Web-Oberfläche |

 Alle API-Endpunkte erfordern das `ADMIN_PASSWORD` — ein Zugriff ohne gültiges Token liefert nur die öffentlichen Frontend-Dateien (HTML/JS/CSS der Login-Seite). Das Passwort ist damit die primäre Schutzmaßnahme.
@ -27,7 +29,6 @@ Alle API-Endpunkte erfordern das `ADMIN_PASSWORD` — ein Zugriff ohne gültiges
 |----------|----------|--------------|
 | `ADMIN_PASSWORD` | – | **Pflicht.** Passwort für die Admin-Oberfläche |
 | `OLLAMA_URL` | `http://localhost:11434` | URL des Ollama-Servers (ohne `/v1`-Suffix) |
-| `DEFAULT_MODEL` | `llama3` | Modell, das verwendet wird wenn der Client keines angibt |
 | `DATABASE_URL` | `sqlite:///./test.db` | Datenbank-Verbindungsstring (SQLite oder PostgreSQL) |
 | `PROXY_HOST` | `0.0.0.0` | Bind-Adresse des Proxy |
 | `PROXY_PORT` | `8000` | Port des Proxy |
@ -35,6 +36,7 @@ Alle API-Endpunkte erfordern das `ADMIN_PASSWORD` — ein Zugriff ohne gültiges
 | `ADMIN_PORT` | `8001` | Port der Admin-API |
 | `APP_TZ` | `Europe/Berlin` | Zeitzone für Tages-/Monats-Reset der Quoten |
 | `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei |
+| `ANTHROPIC_DEFAULT_MODEL` | – | Standard-Modell für `/v1/messages` (Ollama-Modellname, z. B. `llama3`) |

 ## Docker Compose – Ollama auf dem Host (Linux, empfohlen)

@ -59,8 +61,8 @@ volumes:
 ```env
 ADMIN_PASSWORD=changeme
 OLLAMA_URL=http://localhost:11434
-DEFAULT_MODEL=llama3
 APP_TZ=Europe/Berlin
+ANTHROPIC_DEFAULT_MODEL=llama3
 ```

 ## Docker Compose – Ollama als Container, SQLite
@ -78,8 +80,8 @@ services:
    environment:
      ADMIN_PASSWORD: changeme
      OLLAMA_URL: http://ollama:11434
-      DEFAULT_MODEL: llama3
      APP_TZ: Europe/Berlin
+      ANTHROPIC_DEFAULT_MODEL: llama3
    volumes:
      - llmproxy-data:/app/backend
    depends_on:
@ -111,9 +113,9 @@ services:
    environment:
      ADMIN_PASSWORD: changeme
      OLLAMA_URL: http://ollama:11434
-      DEFAULT_MODEL: llama3
      APP_TZ: Europe/Berlin
      DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
+      ANTHROPIC_DEFAULT_MODEL: llama3
    depends_on:
      db:
        condition: service_healthy
@ -148,9 +150,19 @@ volumes:

 ## Client-Konfiguration

-Den Proxy als OpenAI-kompatibler Endpunkt konfigurieren:
-
+**OpenAI-kompatibler Client:**
 ```
 Base URL:  http://<host>:8000/v1
 API Key:   <angelegter API-Key aus der Admin-Oberfläche>
 ```
+
+**Claude Code CLI:**
+```bash
+ANTHROPIC_BASE_URL=http://<host>:8000 \
+ANTHROPIC_AUTH_TOKEN=<API-Key> \
+claude
+```
+
+## Lizenz
+
+MIT — © 2026 Oliver Hofmann. Details siehe [LICENSE](https://git.efi.th-nuernberg.de/gitea/hofmannol/llmproxy/src/branch/main/LICENSE).
--- a/2
+++ b/2
@ -6,6 +6,8 @@ COPY frontend/ frontend/
 RUN npm run build --prefix frontend

 FROM python:3.12-slim
+ARG APP_VERSION=dev
+ENV APP_VERSION=$APP_VERSION
 WORKDIR /app

 COPY backend/requirements.txt .
--- a/KURZANLEITUNG.md
+++ b/KURZANLEITUNG.md
@ -0,0 +1,204 @@
+# LLM-Dienst – Kurzanleitung
+
+## Worum geht es?
+
+Der Dienst stellt **große Sprachmodelle (LLMs)** über eine einfache HTTP-API bereit, die direkt aus Python-Skripten, Jupyter-Notebooks oder eigenen Anwendungen angesprochen werden kann. Die Modelle laufen lokal auf einem GPU-Server im Intranet – ohne Datenübertragung nach außen und ohne Cloud-Kosten.
+
+Typische Anwendungsfälle:
+
+- Texte zusammenfassen, übersetzen oder umformulieren
+- KI-gestütztes Coding (z.B. mit **[opencode](https://opencode.ai)**) 
+- Experimente mit Prompt-Engineering und LLM-Integration in eigene Projekte
+
+---
+
+## Zugang
+
+Der Dienst ist **nur im Intranet (VPN)** erreichbar.
+
+| | |
+|---|---|
+| **API-Endpunkt** | `http://141.75.33.244:8000` |
+| **Authentifizierung** | API-Key erforderlich (per E-Mail beim Admin anfragen) |
+
+---
+
+## Verfügbare Modelle
+
+| Modell | Größe | Hinweis |
+|---|---|---|
+| `gemma4:e4b` | 9,6 GB | sehr schnell, für einfache Aufgaben |
+| `gemma4:31b` | 19 GB | kompakt, schnell |
+| `gpt-oss:20b` | 13 GB | kompakt, schnell |
+| `gpt-oss:120b` | 65 GB | sehr leistungsfähig |
+| `qwen3.5:122b` | 81 GB | sehr leistungsfähig |
+| `qwen3-coder-next:q8_0` | 84 GB | speziell für Code |
+
+> **Wichtig:** Es kann immer nur **ein Modell gleichzeitig** im GPU-Speicher geladen sein.
+> Wechselt jemand das Modell, muss das vorherige entladen und das neue geladen werden –
+> das kann **mehrere Minuten** dauern. Der erste Prompt nach einem Modellwechsel ist
+> deshalb deutlich langsamer. Danach bleibt das Modell einige Zeit geladen.
+
+---
+
+## Python-Beispiel – Einfacher Prompt
+
+Das API folgt dem **OpenAI-Standard**, d.h. die `openai`-Bibliothek kann direkt verwendet werden.
+
+```bash
+pip install openai
+```
+
+```python
+from openai import OpenAI
+
+API_KEY = "sk-..."            # euren API-Key eintragen
+BASE_URL = "http://141.75.33.244:8000/v1"
+MODEL = "gemma4:31b"          # Modell nach Bedarf wählen
+
+client = OpenAI(api_key=API_KEY, base_url=BASE_URL)
+
+response = client.chat.completions.create(
+    model=MODEL,
+    messages=[
+        {"role": "user", "content": "Erkläre den Unterschied zwischen L1- und L2-Regularisierung."}
+    ]
+)
+
+print(response.choices[0].message.content)
+```
+
+---
+
+## Python-Beispiel – Verfügbare Modelle abfragen
+
+```python
+from openai import OpenAI
+
+API_KEY = "sk-..."
+BASE_URL = "http://141.75.33.244:8000/v1"
+
+client = OpenAI(api_key=API_KEY, base_url=BASE_URL)
+
+models = client.models.list()
+for m in models.data:
+    print(m.id)
+```
+
+---
+
+## Aktuell geladenes Modell abfragen
+
+Da immer nur ein Modell gleichzeitig im Speicher sein kann, lässt sich mit folgendem Aufruf prüfen, welches Modell gerade aktiv ist:
+
+```python
+import httpx
+
+r = httpx.get(
+    "http://141.75.33.244:8000/api/ps",
+    headers={"Authorization": "Bearer sk-..."}
+)
+print(r.json())
+```
+
+Die Antwort enthält Modellname, Größe und wie lange das Modell noch im Speicher bleibt.
+
+---
+
+## Empfehlungen zur Nutzung
+
+- **Kleines Modell zuerst** (`gemma4:31b` oder `gpt-oss:20b`) – viel schneller, für viele Aufgaben ausreichend.
+- **Großes Modell** nur bei komplexen Aufgaben (`qwen3.5:122b`, `gpt-oss:120b`).
+- **Code-Aufgaben**: `qwen3-coder-next:q8_0` ist speziell dafür optimiert.
+- Wenn möglich, **dasselbe Modell wie andere Nutzer** verwenden, um häufige Modellwechsel zu vermeiden.
+
+---
+
+## Quotas
+
+Je nach API-Key können folgende Limits konfiguriert sein:
+
+- Maximale **Anfragen pro Tag / Monat**
+- Maximale **Tokens pro Tag / Monat**
+
+Bei Überschreitung gibt die API den Statuscode `429 Too Many Requests` zurück.
+
+---
+
+## Coding-Assistent: opencode
+
+[opencode](https://opencode.ai) ist ein terminal-basierter KI-Coding-Agent (ähnlich Claude Code), der OpenAI-kompatible APIs unterstützt und damit direkt auf den Intranet-Dienst zeigen kann.
+
+### Installation
+
+```bash
+npm install -g opencode-ai
+# oder
+curl -fsSL https://opencode.ai/install | bash
+```
+
+### Konfiguration
+
+Konfigurationsdatei anlegen unter `~/.config/opencode/config.json`:
+
+```json
+{
+  "$schema": "https://opencode.ai/config.json",
+  "providers": {
+    "openai": {
+      "apiKey": "sk-...",
+      "baseURL": "http://141.75.33.244:8000/v1"
+    }
+  },
+  "model": "openai/qwen3-coder-next:q8_0"
+}
+```
+
+Für Code-Aufgaben empfiehlt sich `qwen3-coder-next:q8_0`, für allgemeine Aufgaben `gemma4:31b` oder `gpt-oss:20b`.
+
+### Starten
+
+```bash
+opencode
+```
+
+opencode öffnet eine interaktive Terminal-Oberfläche und kann dann im Projektverzeichnis eingesetzt werden – Dateien lesen, Code generieren, Refactoring vorschlagen usw.
+
+---
+
+## Coding-Assistent: Claude Code
+
+[Claude Code](https://claude.ai/code) ist Anthropics offizieller KI-Coding-Agent für das Terminal. Wer bereits einen Claude-Code-Zugang hat, kann ihn über den Intranet-Dienst mit lokalen Modellen betreiben — ohne Daten an Anthropic zu übertragen.
+
+### Voraussetzung
+
+Ein aktiver Claude-Code-Zugang (Claude Pro oder Team).
+
+### Starten
+
+```bash
+ANTHROPIC_BASE_URL=http://141.75.33.244:8000 \
+ANTHROPIC_AUTH_TOKEN=sk-... \
+claude
+```
+
+Das zu verwendende Modell wird vom Admin über `ANTHROPIC_DEFAULT_MODEL` vorkonfiguriert — eine manuelle Modellauswahl ist nicht nötig.
+
+---
+
+## Administration (nur für Admins)
+
+Das Web-Interface zur Verwaltung von API-Keys und Quotas ist erreichbar unter:
+
+**`http://141.75.33.244:8001`**
+
+Dort können API-Keys angelegt, deaktiviert und mit Quotas versehen werden.
+
+### Modell-Lock für Praktika
+
+Unter **Einstellungen → Aktives Modell (Lock)** kann ein Modell fest vorgegeben werden. Ist ein Lock gesetzt, wird das `model`-Feld in jedem Request durch dieses Modell ersetzt – unabhängig davon, was der Client schickt. Das verhindert unkoordinierte Modellwechsel während einer Veranstaltung, die alle Teilnehmenden durch lange Ladezeiten ausbremsen würden.
+
+Typischer Ablauf für ein Praktikum:
+1. Vor der Veranstaltung: passendes Modell in Ollama laden
+2. Lock in der Admin-Oberfläche aktivieren
+3. Nach der Veranstaltung: Lock wieder deaktivieren (Feld leeren)
--- a/27
+++ b/27
@ -0,0 +1,27 @@
+MIT License
+
+Copyright (c) 2026 Oliver Hofmann
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+---
+
+Portions of this software were inspired by free-claude-code
+(https://github.com/Alishahryar1/free-claude-code),
+copyright (c) 2026 Ali Khokhar, MIT License.
--- a/README.md
+++ b/README.md
@ -4,12 +4,13 @@ Ollama bietet von sich aus keine Authentifizierung — wer die API erreicht, kan

 ## Features

- API-Key-Authentifizierung (Bearer Token oder `sk-`-Prefix)
+- API-Key-Authentifizierung (Bearer Token, `sk-`-Prefix, `x-api-key`- und `anthropic-auth-token`-Header)
 - Optionales Ablaufdatum pro API-Key
 - Quota-Management mit getrennten Tages- und Monatslimits (Tokens & Requests)
- Token-Zählung via tiktoken, Reset-Grenzen in der Zeitzone Europe/Berlin
+- Token-Zählung via tiktoken, Reset-Grenzen in der konfigurierten Zeitzone
 - Web-Admin-Oberfläche (API-Keys verwalten, Ollama-Einstellungen, Verbrauchsanzeige)
 - OpenAI-kompatibler `/v1/chat/completions`-Endpunkt mit Streaming und Tool-Use
+- Anthropic Messages API `/v1/messages` — kompatibel mit Claude Code CLI und Anthropic-SDK-Clients
 - Rotierende Nutzungs-Logs
 - SQLite (Standard) oder PostgreSQL
 - Docker-Image auf DockerHub: `mediaeng/llmproxy`
@ -33,9 +34,9 @@ ADMIN_HOST=0.0.0.0
 ADMIN_PORT=8001
 DATABASE_URL=sqlite:///./test.db
 OLLAMA_URL=http://localhost:11434
-DEFAULT_MODEL=llama3
 APP_TZ=Europe/Berlin
 LOG_FILE=logs/usage.log
+ANTHROPIC_DEFAULT_MODEL=llama3
 ```

 | Variable | Standard | Beschreibung |
@ -47,10 +48,10 @@ LOG_FILE=logs/usage.log
 | `ADMIN_PORT` | `8001` | Port der Admin-API |
 | `DATABASE_URL` | `sqlite:///./test.db` | DB-Verbindungsstring (SQLite oder PostgreSQL) |
 | `OLLAMA_URL` | `http://localhost:11434` | Adresse der Ollama-Instanz (auch in der UI änderbar) |
-| `DEFAULT_MODEL` | `llama3` | Standard-Modell für `/v1/chat/completions` (auch in der UI änderbar) |
 | `APP_TZ` | `Europe/Berlin` | Zeitzone für tägliche/monatliche Quota-Resets |
 | `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei |
 | `ALLOWED_ORIGINS` | `http://localhost:5173` | CORS-Origins (nur für Entwicklung relevant) |
+| `ANTHROPIC_DEFAULT_MODEL` | — | Standard-Modell für `/v1/messages` (Ollama-Modellname) |

 ## Entwicklung (lokal)

@ -80,6 +81,23 @@ Das Script prüft alle Ports auf Belegung, initialisiert die Datenbank und start

 Admin-Oberfläche: `http://localhost:5173`

+## Claude Code CLI
+
+Der Proxy stellt einen Anthropic-kompatiblen Endpunkt bereit, über den Claude Code CLI mit lokalen Ollama-Modellen genutzt werden kann.
+
+```bash
+# ANTHROPIC_DEFAULT_MODEL in .env setzen, dann:
+./start_claude.sh
+
+# Oder mit Key als Argument:
+./start_claude.sh sk-dein-api-key
+
+# Oder als Umgebungsvariable:
+PROXY_API_KEY=sk-dein-api-key ./start_claude.sh
+```
+
+Das Script setzt `ANTHROPIC_BASE_URL` und `ANTHROPIC_AUTH_TOKEN` automatisch aus der `.env` und startet `claude`.
+
 ## Produktion (Docker)

 ### Docker Compose (empfohlen)
@ -170,17 +188,26 @@ Clients konfigurieren dann `https://llm.example.com/v1` als Base URL.

 ## Proxy-Endpunkte (Port 8000)

-Alle Endpunkte erfordern einen gültigen API-Key im `Authorization`-Header.
+Alle Endpunkte erfordern einen gültigen API-Key im `Authorization`-Header (`Bearer sk-...`), im `x-api-key`-Header oder im `anthropic-auth-token`-Header.

 ```bash
+# OpenAI-kompatibler Endpunkt
 curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer sk-xxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3","messages":[{"role":"user","content":"Hallo"}]}'
+
+# Anthropic-kompatibler Endpunkt (z. B. für Claude Code)
+curl -X POST http://localhost:8000/v1/messages \
+  -H "x-api-key: sk-xxxxxx" \
+  -H "anthropic-version: 2023-06-01" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"llama3","messages":[{"role":"user","content":"Hallo"}],"max_tokens":1024}'
 ```

 | Endpunkt | Methode | Beschreibung |
 |----------|---------|--------------|
+| `/v1/messages` | POST | Chat (Anthropic-Format, Streaming + Tool-Use) |
 | `/v1/chat/completions` | POST | Chat (OpenAI-Format, Streaming + Tool-Use) |
 | `/v1/models` | GET | Modelle (OpenAI-Format) |
 | `/api/generate` | POST | Ollama generate (nativ) |
@ -228,7 +255,8 @@ llm_quota/
 │   └── tests/
 │       ├── conftest.py
 │       ├── test_auth.py
-│       └── test_quota.py
+│       ├── test_quota.py
+│       └── test_anthropic_messages.py
 ├── frontend/
 │   └── src/
 │       ├── main.jsx         # React-Admin-UI
@ -240,13 +268,19 @@ llm_quota/
 ├── docker-entrypoint.sh
 ├── .dockerignore
 ├── start.sh                 # Entwicklungs-Startscript
+├── start_claude.sh          # Claude Code CLI mit Proxy starten
 ├── run_dev.py               # Entwicklungs-Runner für PyCharm
 ├── build_push.sh            # Docker-Build & Push zu DockerHub
+├── LICENSE
 ├── DOCKERHUB.md             # DockerHub-Beschreibung (deutsch)
 ├── DOCKERHUB.en.md          # DockerHub-Beschreibung (englisch)
 └── .gitignore
 ```

+## Danksagung
+
+Der Anthropic-kompatible Endpunkt (`/v1/messages`) wurde durch das Projekt [free-claude-code](https://github.com/Alishahryar1/free-claude-code) von Ali Khokhar inspiriert, das einen ähnlichen Ansatz für das Weiterleiten von Claude-Code-Anfragen an alternative LLM-Backends verfolgt.
+
 ## Lizenz

-MIT
+MIT — siehe [LICENSE](LICENSE)
--- a/backend/admin.py
+++ b/backend/admin.py
@ -131,13 +131,16 @@ async def get_proxy_info(_ = Depends(require_admin_auth)):
    host = os.getenv("PROXY_HOST", "0.0.0.0")
    port = os.getenv("PROXY_PORT", "8000")
    display_host = "localhost" if host in ("0.0.0.0", "::") else host
-    return {"endpoint": f"http://{display_host}:{port}"}
+    return {
+        "endpoint": f"http://{display_host}:{port}",
+        "version": os.getenv("APP_VERSION", "dev"),
+    }

@app.get("/api/settings", response_model=schemas.Settings)
 async def read_settings(db: Session = Depends(get_db), _ = Depends(require_admin_auth)):
    return schemas.Settings(
        ollama_url=crud.get_setting(db, "ollama_url", "http://localhost:11434"),
-        default_model=crud.get_setting(db, "default_model", "llama3"),
+        force_model=crud.get_setting(db, "force_model") or None,
    )

@app.put("/api/settings", response_model=schemas.Settings)
@ -148,8 +151,8 @@ async def update_settings(
 ):
    ollama_url = settings.ollama_url.rstrip('/').removesuffix('/v1')
    crud.set_setting(db, "ollama_url", ollama_url)
-    crud.set_setting(db, "default_model", settings.default_model)
-    return schemas.Settings(ollama_url=ollama_url, default_model=settings.default_model)
+    crud.set_setting(db, "force_model", settings.force_model or "")
+    return schemas.Settings(ollama_url=ollama_url, force_model=settings.force_model or None)

@app.get("/api/ollama-models")
 async def get_ollama_models(
@ -166,6 +169,18 @@ async def get_ollama_models(
    except Exception:
        return {"models": [], "reachable": False}

+@app.get("/api/logs/{name}")
+async def get_log_lines(name: str, _ = Depends(require_admin_auth)):
+    if name not in ("usage", "error"):
+        raise HTTPException(status_code=400, detail="name must be 'usage' or 'error'")
+    log_file = Path(os.getenv("LOG_FILE", "logs/usage.log"))
+    path = log_file if name == "usage" else log_file.parent / "error.log"
+    try:
+        lines = path.read_text(encoding="utf-8").splitlines()
+        return {"lines": lines[-10:]}
+    except FileNotFoundError:
+        return {"lines": []}
+
 # Statisches Frontend ausliefern (nur im Produktivbetrieb, wenn dist/ existiert)
 _dist = Path(__file__).parent.parent / "frontend" / "dist"
 if _dist.exists():
--- a/backend/database.py
+++ b/backend/database.py
@ -1,12 +1,20 @@
 import os
+from pathlib import Path
 from dotenv import load_dotenv
 from sqlalchemy import create_engine

-load_dotenv(dotenv_path=os.path.join(os.path.dirname(__file__), '..', '.env'))
+load_dotenv(dotenv_path=Path(__file__).resolve().parent.parent / ".env")
 from sqlalchemy.orm import sessionmaker, declarative_base

 DATABASE_URL = os.environ.get("DATABASE_URL", "sqlite:///./test.db")

+# Relative SQLite-Pfade immer relativ zu dieser Datei auflösen, nicht zum cwd
+if DATABASE_URL.startswith("sqlite:///") and not DATABASE_URL.startswith("sqlite:////"):
+    db_path = DATABASE_URL[len("sqlite:///"):]
+    if not os.path.isabs(db_path):
+        db_path = str(Path(__file__).resolve().parent / db_path)
+        DATABASE_URL = f"sqlite:///{db_path}"
+
 if "sqlite" in DATABASE_URL:
    engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
 else:
--- a/backend/init_db.py
+++ b/backend/init_db.py
@ -13,8 +13,6 @@ def init_db():
    db = SessionLocal()
    if not get_setting(db, "ollama_url"):
        set_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
-    if not get_setting(db, "default_model"):
-        set_setting(db, "default_model", os.getenv("DEFAULT_MODEL", "llama3"))
    db.close()

    print("Database initialized.")
--- a/backend/main.py
+++ b/backend/main.py
@ -1,5 +1,8 @@
+import json
 import logging
 import os
+import secrets
+import time
 from logging.handlers import RotatingFileHandler
 from pathlib import Path

@ -49,10 +52,16 @@ def _last_user_msg(messages: list, max_len: int = 120) -> str:

 async def require_api_key(request: Request, db: Session = Depends(get_db)):
    auth_header = request.headers.get("Authorization", "")
+    x_api_key = request.headers.get("x-api-key", "")
+    auth_token = request.headers.get("anthropic-auth-token", "")
    if auth_header.startswith("Bearer "):
        api_key = auth_header[7:]
    elif auth_header.startswith("sk-"):
        api_key = auth_header
+    elif x_api_key:
+        api_key = x_api_key
+    elif auth_token:
+        api_key = auth_token
    else:
        raise HTTPException(status_code=401, detail="Invalid or missing API key")
    db_key = crud.verify_api_key(db, api_key)
@ -70,8 +79,6 @@ def apply_env_settings():
    try:
        if url := os.getenv("OLLAMA_URL"):
            crud.set_setting(db, "ollama_url", url)
-        if model := os.getenv("DEFAULT_MODEL"):
-            crud.set_setting(db, "default_model", model)
        db.commit()
    finally:
        db.close()
@ -82,15 +89,25 @@ async def unhandled_exception_handler(request: Request, exc: Exception):
                    request.method, request.url.path, type(exc).__name__, exc, exc_info=exc)
    return JSONResponse(status_code=500, content={"error": {"message": "Internal server error", "type": "server_error"}})

+def _backend_headers() -> dict:
+    key = os.getenv("BACKEND_API_KEY")
+    return {"Authorization": f"Bearer {key}"} if key else {}
+
+
 async def proxy_request(url: str, method: str = "GET", json_data: dict = None):
    async with httpx.AsyncClient(timeout=300.0) as client:
-        response = await client.request(method=method, url=url, json=json_data)
+        response = await client.request(method=method, url=url, json=json_data, headers=_backend_headers())
        return response

@app.post("/api/generate")
 async def generate(request: Request, db: Session = Depends(get_db)):
    ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
    body = await request.json()
+    force_model = crud.get_setting(db, "force_model") or None
+    if force_model:
+        body = {**body, "model": force_model}
+    if not body.get("model"):
+        raise HTTPException(status_code=422, detail="Field 'model' is required")
    prompt_tokens = crud.count_tokens(body.get("prompt", ""))

    if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
@ -99,9 +116,15 @@ async def generate(request: Request, db: Session = Depends(get_db)):
    prompt_preview = (body.get("prompt", "").replace("\n", " ").strip())[:120]
    usage_log.info('%s | /api/generate | %s | ~%d tokens | "%s"',
                   request.state.api_key_name, body.get("model", "?"), prompt_tokens, prompt_preview)
+    start = time.monotonic()
    try:
        response = await proxy_request(f"{ollama_url}/api/generate", method="POST", json_data=body)
-        return JSONResponse(content=response.json(), status_code=response.status_code)
+        resp_json = response.json()
+        usage_log.info('%s | /api/generate | %s | actual ↑%d ↓%d tokens | %.1fs',
+                       request.state.api_key_name, body.get("model", "?"),
+                       resp_json.get("prompt_eval_count", 0), resp_json.get("eval_count", 0),
+                       time.monotonic() - start)
+        return JSONResponse(content=resp_json, status_code=response.status_code)
    except Exception as exc:
        error_log.error("Proxy error | %s | /api/generate | %s | %s: %s",
                        request.state.api_key_name, body.get("model", "?"), type(exc).__name__, exc, exc_info=exc)
@ -111,6 +134,11 @@ async def generate(request: Request, db: Session = Depends(get_db)):
 async def chat(request: Request, db: Session = Depends(get_db)):
    ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
    body = await request.json()
+    force_model = crud.get_setting(db, "force_model") or None
+    if force_model:
+        body = {**body, "model": force_model}
+    if not body.get("model"):
+        raise HTTPException(status_code=422, detail="Field 'model' is required")
    messages = body.get("messages", [])
    prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages)

@ -119,9 +147,15 @@ async def chat(request: Request, db: Session = Depends(get_db)):

    usage_log.info('%s | /api/chat | %s | ~%d tokens | "%s"',
                   request.state.api_key_name, body.get("model", "?"), prompt_tokens, _last_user_msg(messages))
+    start = time.monotonic()
    try:
        response = await proxy_request(f"{ollama_url}/api/chat", method="POST", json_data=body)
-        return JSONResponse(content=response.json(), status_code=response.status_code)
+        resp_json = response.json()
+        usage_log.info('%s | /api/chat | %s | actual ↑%d ↓%d tokens | %.1fs',
+                       request.state.api_key_name, body.get("model", "?"),
+                       resp_json.get("prompt_eval_count", 0), resp_json.get("eval_count", 0),
+                       time.monotonic() - start)
+        return JSONResponse(content=resp_json, status_code=response.status_code)
    except Exception as exc:
        error_log.error("Proxy error | %s | /api/chat | %s | %s: %s",
                        request.state.api_key_name, body.get("model", "?"), type(exc).__name__, exc, exc_info=exc)
@ -133,12 +167,226 @@ async def list_models(db: Session = Depends(get_db)):
    response = await proxy_request(f"{ollama_url}/api/tags", method="GET")
    return JSONResponse(content=response.json(), status_code=response.status_code)

+@app.get("/version")
+async def version():
+    return {"version": os.getenv("APP_VERSION", "dev")}
+
+@app.get("/api/ps")
+async def running_models(db: Session = Depends(get_db)):
+    ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
+    response = await proxy_request(f"{ollama_url}/api/ps", method="GET")
+    return JSONResponse(content=response.json(), status_code=response.status_code)
+
@app.get("/api/versions")
 async def versions(db: Session = Depends(get_db)):
    ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
    response = await proxy_request(f"{ollama_url}/api/versions", method="GET")
    return JSONResponse(content=response.json(), status_code=response.status_code)

+
+# --- Anthropic Messages API compatibility layer ---
+
+def _anthropic_content_to_str(content) -> str:
+    """Flatten Anthropic content (string or block array) to a plain string."""
+    if isinstance(content, str):
+        return content
+    if isinstance(content, list):
+        parts = []
+        for block in content:
+            if not isinstance(block, dict):
+                continue
+            if block.get("type") == "text":
+                parts.append(block.get("text", ""))
+            elif block.get("type") == "tool_result":
+                raw = block.get("content", "")
+                if isinstance(raw, list):
+                    raw = " ".join(r.get("text", "") for r in raw if isinstance(r, dict) and r.get("type") == "text")
+                parts.append(str(raw))
+        return " ".join(parts)
+    return str(content) if content else ""
+
+
+def _anthropic_messages_to_ollama(messages: list, system: str = None) -> list:
+    """Transform Anthropic messages array to Ollama /api/chat format."""
+    result = []
+    if system:
+        result.append({"role": "system", "content": system})
+    for msg in messages:
+        role = msg.get("role")
+        content = msg.get("content")
+        if role == "assistant" and isinstance(content, list):
+            text = " ".join(b.get("text", "") for b in content if isinstance(b, dict) and b.get("type") == "text")
+            tool_calls = [
+                {"function": {"name": b["name"], "arguments": b.get("input", {})}}
+                for b in content if isinstance(b, dict) and b.get("type") == "tool_use"
+            ]
+            entry = {"role": "assistant", "content": text}
+            if tool_calls:
+                entry["tool_calls"] = tool_calls
+            result.append(entry)
+        elif role == "user" and isinstance(content, list):
+            text_parts = []
+            for block in content:
+                if not isinstance(block, dict):
+                    continue
+                if block.get("type") == "tool_result":
+                    if text_parts:
+                        result.append({"role": "user", "content": " ".join(text_parts)})
+                        text_parts = []
+                    raw = block.get("content", "")
+                    if isinstance(raw, list):
+                        raw = " ".join(r.get("text", "") for r in raw if isinstance(r, dict) and r.get("type") == "text")
+                    result.append({"role": "tool", "content": str(raw)})
+                elif block.get("type") == "text":
+                    text_parts.append(block.get("text", ""))
+            if text_parts:
+                result.append({"role": "user", "content": " ".join(text_parts)})
+        else:
+            result.append({"role": role, "content": _anthropic_content_to_str(content)})
+    return result
+
+
+def _anthropic_tools_to_ollama(tools: list) -> list:
+    """Transform Anthropic tools to Ollama/OpenAI function format."""
+    return [
+        {
+            "type": "function",
+            "function": {
+                "name": t["name"],
+                "description": t.get("description", ""),
+                "parameters": t.get("input_schema", {}),
+            },
+        }
+        for t in tools
+    ]
+
+
+def _ollama_to_anthropic_response(ollama_resp: dict, model_name: str, msg_id: str) -> dict:
+    """Transform an Ollama /api/chat response to Anthropic Messages API format."""
+    msg = ollama_resp.get("message", {})
+    text = msg.get("content", "")
+    tool_calls = msg.get("tool_calls") or []
+
+    content_blocks = []
+    if text:
+        content_blocks.append({"type": "text", "text": text})
+
+    stop_reason = "end_turn"
+    for i, tc in enumerate(tool_calls):
+        stop_reason = "tool_use"
+        fn = tc.get("function", {})
+        args = fn.get("arguments", {})
+        if isinstance(args, str):
+            try:
+                args = json.loads(args)
+            except json.JSONDecodeError:
+                args = {}
+        content_blocks.append({
+            "type": "tool_use",
+            "id": f"toolu_{msg_id}_{i}",
+            "name": fn.get("name", ""),
+            "input": args,
+        })
+
+    return {
+        "id": f"msg_{msg_id}",
+        "type": "message",
+        "role": "assistant",
+        "content": content_blocks,
+        "model": model_name,
+        "stop_reason": stop_reason,
+        "stop_sequence": None,
+        "usage": {
+            "input_tokens": ollama_resp.get("prompt_eval_count", 0),
+            "output_tokens": ollama_resp.get("eval_count", 0),
+        },
+    }
+
+
+@app.post("/v1/messages")
+async def anthropic_messages(request: Request, db: Session = Depends(get_db)):
+    ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
+    body = await request.json()
+
+    force_model = crud.get_setting(db, "force_model") or None
+    model_name = force_model or os.getenv("ANTHROPIC_DEFAULT_MODEL") or body.get("model")
+    if not model_name:
+        raise HTTPException(status_code=422, detail="Field 'model' is required")
+
+    anthropic_msgs = body.get("messages", [])
+    system = body.get("system")
+
+    system_str = _anthropic_content_to_str(system) if system else ""
+    all_text = system_str + " ".join(_anthropic_content_to_str(m.get("content")) for m in anthropic_msgs)
+    prompt_tokens = crud.count_tokens(all_text)
+
+    if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
+        raise HTTPException(status_code=429, detail="Quota exceeded")
+
+    ollama_messages = _anthropic_messages_to_ollama(anthropic_msgs, system=system_str)
+    ollama_body: dict = {"model": model_name, "messages": ollama_messages, "stream": body.get("stream", False)}
+    if tools := body.get("tools"):
+        ollama_body["tools"] = _anthropic_tools_to_ollama(tools)
+
+    msg_id = secrets.token_hex(12)
+    target = f"{ollama_url}/api/chat"
+
+    usage_log.info('%s | /v1/messages | %s | ~%d tokens | "%s"',
+                   request.state.api_key_name, model_name, prompt_tokens, _last_user_msg(ollama_messages))
+    start = time.monotonic()
+
+    if body.get("stream"):
+        # Backend wird immer non-streaming aufgerufen; der Dev-Proxy baut SSE selbst auf.
+        # Das ist nötig, weil vorgelagerte Proxys (z.B. Produktiv-Proxy) /api/chat
+        # nur non-streaming exponieren.
+        non_stream_body = {**ollama_body, "stream": False}
+
+        async def generate():
+            try:
+                response = await proxy_request(target, method="POST", json_data=non_stream_body)
+                ollama_resp = response.json()
+            except Exception as exc:
+                error_log.error("Stream error | %s | /v1/messages | %s | %s: %s",
+                                request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
+                raise
+
+            msg = ollama_resp.get("message", {})
+            text = msg.get("content", "")
+            input_tokens = ollama_resp.get("prompt_eval_count", 0)
+            output_tokens = ollama_resp.get("eval_count", 0)
+
+            yield f"event: message_start\ndata: {json.dumps({'type': 'message_start', 'message': {'id': f'msg_{msg_id}', 'type': 'message', 'role': 'assistant', 'content': [], 'model': model_name, 'stop_reason': None, 'stop_sequence': None, 'usage': {'input_tokens': input_tokens, 'output_tokens': 0}}})}\n\n"
+            yield f"event: content_block_start\ndata: {json.dumps({'type': 'content_block_start', 'index': 0, 'content_block': {'type': 'text', 'text': ''}})}\n\n"
+            yield f"event: ping\ndata: {json.dumps({'type': 'ping'})}\n\n"
+            if text:
+                yield f"event: content_block_delta\ndata: {json.dumps({'type': 'content_block_delta', 'index': 0, 'delta': {'type': 'text_delta', 'text': text}})}\n\n"
+            yield f"event: content_block_stop\ndata: {json.dumps({'type': 'content_block_stop', 'index': 0})}\n\n"
+            yield f"event: message_delta\ndata: {json.dumps({'type': 'message_delta', 'delta': {'stop_reason': 'end_turn', 'stop_sequence': None}, 'usage': {'output_tokens': output_tokens}})}\n\n"
+            yield f"event: message_stop\ndata: {json.dumps({'type': 'message_stop'})}\n\n"
+            usage_log.info('%s | /v1/messages | %s | actual ↑%d ↓%d tokens | %.1fs',
+                           request.state.api_key_name, model_name,
+                           input_tokens, output_tokens,
+                           time.monotonic() - start)
+
+        return StreamingResponse(
+            generate(),
+            media_type="text/event-stream",
+            headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
+        )
+
+    try:
+        response = await proxy_request(target, method="POST", json_data=ollama_body)
+        result = _ollama_to_anthropic_response(response.json(), model_name, msg_id)
+        usage_log.info('%s | /v1/messages | %s | actual ↑%d ↓%d tokens | %.1fs',
+                       request.state.api_key_name, model_name,
+                       result["usage"]["input_tokens"], result["usage"]["output_tokens"],
+                       time.monotonic() - start)
+        return JSONResponse(content=result, status_code=response.status_code)
+    except Exception as exc:
+        error_log.error("Proxy error | %s | /v1/messages | %s | %s: %s",
+                        request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
+        raise
+
@app.get("/v1/models")
 async def list_openai_models(db: Session = Depends(get_db)):
    ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
@ -148,44 +396,73 @@ async def list_openai_models(db: Session = Depends(get_db)):
@app.post("/v1/chat/completions")
 async def openai_chat_completions(request: Request, db: Session = Depends(get_db)):
    ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
-    default_model = crud.get_setting(db, "default_model", os.getenv("DEFAULT_MODEL", "llama3"))

    body = await request.json()
+    force_model = crud.get_setting(db, "force_model") or None
+    if force_model:
+        body = {**body, "model": force_model}
+    if not body.get("model"):
+        raise HTTPException(status_code=422, detail="Field 'model' is required")
    messages = body.get("messages", [])
    prompt_tokens = sum(crud.count_tokens(_content_to_str(msg.get("content"))) for msg in messages)

    if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
        raise HTTPException(status_code=429, detail="Quota exceeded")

-    if "model" not in body:
-        body = {**body, "model": default_model}
-
    model_name = body["model"]
+
    usage_log.info('%s | /v1/chat/completions | %s | ~%d tokens | "%s"',
                   request.state.api_key_name, model_name, prompt_tokens, _last_user_msg(messages))

    target = f"{ollama_url}/v1/chat/completions"

    if body.get("stream"):
+        existing_opts = body.get("stream_options") or {}
+        stream_body = {**body, "stream_options": {**existing_opts, "include_usage": True}}
+        start = time.monotonic()
+        usage_tokens = {"prompt": 0, "completion": 0}
+
        async def generate():
            try:
                async with httpx.AsyncClient(timeout=300.0) as client:
-                    async with client.stream("POST", target, json=body) as resp:
+                    async with client.stream("POST", target, json=stream_body, headers=_backend_headers()) as resp:
                        async for chunk in resp.aiter_bytes():
+                            try:
+                                for line in chunk.decode("utf-8", errors="ignore").splitlines():
+                                    if line.startswith("data: ") and "[DONE]" not in line:
+                                        data = json.loads(line[6:])
+                                        if u := data.get("usage"):
+                                            usage_tokens["prompt"] = u.get("prompt_tokens", 0)
+                                            usage_tokens["completion"] = u.get("completion_tokens", 0)
+                            except Exception:
+                                pass
                            yield chunk
            except Exception as exc:
                error_log.error("Stream error | %s | /v1/chat/completions | %s | %s: %s",
                                request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
                raise
+            finally:
+                usage_log.info('%s | /v1/chat/completions | %s | actual ↑%d ↓%d tokens | %.1fs',
+                               request.state.api_key_name, model_name,
+                               usage_tokens["prompt"], usage_tokens["completion"],
+                               time.monotonic() - start)
+
        return StreamingResponse(
            generate(),
            media_type="text/event-stream",
            headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
        )

+    start = time.monotonic()
    try:
        response = await proxy_request(target, method="POST", json_data=body)
-        return JSONResponse(content=response.json(), status_code=response.status_code)
+        resp_json = response.json()
+        usage = resp_json.get("usage", {})
+        usage_log.info('%s | /v1/chat/completions | %s | actual ↑%d ↓%d tokens | %.1fs',
+                       request.state.api_key_name, model_name,
+                       usage.get("prompt_tokens", 0), usage.get("completion_tokens", 0),
+                       time.monotonic() - start)
+        return JSONResponse(content=resp_json, status_code=response.status_code)
    except Exception as exc:
        error_log.error("Proxy error | %s | /v1/chat/completions | %s | %s: %s",
                        request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
--- a/backend/schemas.py
+++ b/backend/schemas.py
@ -40,7 +40,7 @@ class QuotaUpdate(BaseModel):

 class Settings(BaseModel):
    ollama_url: str
-    default_model: str
+    force_model: Optional[str] = None

 class UsageStats(BaseModel):
    tokens_used_today: int = 0
--- a/backend/tests/test_admin_logs.py
+++ b/backend/tests/test_admin_logs.py
@ -0,0 +1,59 @@
+import os
+import pytest
+from fastapi.testclient import TestClient
+
+os.environ.setdefault("ADMIN_PASSWORD", "test-admin-pw")
+os.environ.setdefault("OLLAMA_URL", "http://127.0.0.1:9999")
+
+
+@pytest.fixture
+def client(tmp_path):
+    log_file = tmp_path / "usage.log"
+    log_file.write_text("\n".join(f"Zeile {i}" for i in range(1, 16)) + "\n")
+    (tmp_path / "error.log").write_text("Fehler A\nFehler B\n")
+    os.environ["LOG_FILE"] = str(log_file)
+
+    from database import Base, engine
+    Base.metadata.drop_all(bind=engine)
+    Base.metadata.create_all(bind=engine)
+
+    from admin import app
+    yield TestClient(app, raise_server_exceptions=False)
+
+    Base.metadata.drop_all(bind=engine)
+    os.environ.pop("LOG_FILE", None)
+
+
+AUTH = {"Authorization": "Bearer test-admin-pw"}
+
+
+def test_logs_usage_returns_last_10_lines(client):
+    resp = client.get("/api/logs/usage", headers=AUTH)
+    assert resp.status_code == 200
+    lines = resp.json()["lines"]
+    assert len(lines) == 10
+    assert lines[-1] == "Zeile 15"
+    assert lines[0] == "Zeile 6"
+
+
+def test_logs_error_returns_content(client):
+    resp = client.get("/api/logs/error", headers=AUTH)
+    assert resp.status_code == 200
+    assert resp.json()["lines"] == ["Fehler A", "Fehler B"]
+
+
+def test_logs_missing_file_returns_empty(client, tmp_path):
+    os.environ["LOG_FILE"] = str(tmp_path / "nonexistent.log")
+    resp = client.get("/api/logs/usage", headers=AUTH)
+    assert resp.status_code == 200
+    assert resp.json()["lines"] == []
+
+
+def test_logs_invalid_name_returns_400(client):
+    resp = client.get("/api/logs/secret", headers=AUTH)
+    assert resp.status_code == 400
+
+
+def test_logs_requires_auth(client):
+    resp = client.get("/api/logs/usage")
+    assert resp.status_code == 401
--- a/backend/tests/test_anthropic_messages.py
+++ b/backend/tests/test_anthropic_messages.py
@ -0,0 +1,272 @@
+import json
+import os
+from unittest.mock import AsyncMock, MagicMock, patch, call
+
+
+def _make_body(model="llama3", messages=None, stream=False, **kwargs):
+    body = {
+        "model": model,
+        "messages": messages or [{"role": "user", "content": "Hello"}],
+        "max_tokens": 100,
+    }
+    if stream:
+        body["stream"] = True
+    body.update(kwargs)
+    return body
+
+
+def _ollama_chat_response(content="Hi!", input_tokens=5, output_tokens=3):
+    return {
+        "model": "llama3",
+        "message": {"role": "assistant", "content": content},
+        "prompt_eval_count": input_tokens,
+        "eval_count": output_tokens,
+        "done": True,
+    }
+
+
+# --- Auth ---
+
+def test_messages_missing_auth_returns_401(test_client):
+    response = test_client.post("/v1/messages", json=_make_body())
+    assert response.status_code == 401
+
+
+def test_messages_invalid_key_returns_401(test_client):
+    response = test_client.post(
+        "/v1/messages",
+        headers={"x-api-key": "sk-invalid"},
+        json=_make_body(),
+    )
+    assert response.status_code == 401
+
+
+@patch("main.proxy_request", new_callable=AsyncMock)
+def test_messages_accepts_anthropic_auth_token_header(mock_proxy, test_client):
+    mock_proxy.return_value.status_code = 200
+    mock_proxy.return_value.json = lambda: _ollama_chat_response()
+    response = test_client.post(
+        "/v1/messages",
+        headers={"anthropic-auth-token": os.environ.get("TEST_API_KEY", "")},
+        json=_make_body(),
+    )
+    assert response.status_code == 200
+
+
+@patch("main.proxy_request", new_callable=AsyncMock)
+def test_messages_accepts_x_api_key_header(mock_proxy, test_client):
+    mock_proxy.return_value.status_code = 200
+    mock_proxy.return_value.json = lambda: _ollama_chat_response()
+    response = test_client.post(
+        "/v1/messages",
+        headers={"x-api-key": os.environ.get("TEST_API_KEY", "")},
+        json=_make_body(),
+    )
+    assert response.status_code == 200
+
+
+# --- Validation ---
+
+def test_messages_missing_model_returns_422(test_client):
+    env = {k: v for k, v in os.environ.items() if k != "ANTHROPIC_DEFAULT_MODEL"}
+    with patch.dict(os.environ, env, clear=True):
+        response = test_client.post(
+            "/v1/messages",
+            headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+            json={"messages": [{"role": "user", "content": "Hi"}], "max_tokens": 100},
+        )
+    assert response.status_code == 422
+
+
+@patch("main.proxy_request", new_callable=AsyncMock)
+def test_messages_anthropic_default_model_used_when_no_model_in_request(mock_proxy, test_client):
+    mock_proxy.return_value.status_code = 200
+    mock_proxy.return_value.json = lambda: _ollama_chat_response()
+    with patch.dict(os.environ, {"ANTHROPIC_DEFAULT_MODEL": "qwen3-coder:q8_0"}):
+        test_client.post(
+            "/v1/messages",
+            headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+            json={"messages": [{"role": "user", "content": "Hi"}], "max_tokens": 100},
+        )
+    sent_body = mock_proxy.call_args[1]["json_data"]
+    assert sent_body["model"] == "qwen3-coder:q8_0"
+
+
+# --- Quota ---
+
+def test_messages_quota_exceeded_returns_429(test_client):
+    with patch("main.crud.check_and_increment_quota", return_value=False):
+        response = test_client.post(
+            "/v1/messages",
+            headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+            json=_make_body(),
+        )
+        assert response.status_code == 429
+
+
+# --- Response format ---
+
+@patch("main.proxy_request", new_callable=AsyncMock)
+def test_messages_returns_anthropic_format(mock_proxy, test_client):
+    mock_proxy.return_value.status_code = 200
+    mock_proxy.return_value.json = lambda: _ollama_chat_response("Hello!")
+    response = test_client.post(
+        "/v1/messages",
+        headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+        json=_make_body(),
+    )
+    assert response.status_code == 200
+    data = response.json()
+    assert data["type"] == "message"
+    assert data["role"] == "assistant"
+    assert isinstance(data["content"], list)
+    assert data["content"][0]["type"] == "text"
+    assert data["content"][0]["text"] == "Hello!"
+    assert data["usage"]["input_tokens"] == 5
+    assert data["usage"]["output_tokens"] == 3
+
+
+# --- Request transformation ---
+
+@patch("main.proxy_request", new_callable=AsyncMock)
+def test_messages_system_prompt_becomes_first_system_message(mock_proxy, test_client):
+    mock_proxy.return_value.status_code = 200
+    mock_proxy.return_value.json = lambda: _ollama_chat_response()
+    test_client.post(
+        "/v1/messages",
+        headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+        json=_make_body(system="You are helpful"),
+    )
+    sent_body = mock_proxy.call_args[1]["json_data"]
+    assert sent_body["messages"][0]["role"] == "system"
+    assert sent_body["messages"][0]["content"] == "You are helpful"
+
+
+@patch("main.proxy_request", new_callable=AsyncMock)
+def test_messages_tools_transformed_to_ollama_function_format(mock_proxy, test_client):
+    mock_proxy.return_value.status_code = 200
+    mock_proxy.return_value.json = lambda: _ollama_chat_response()
+    test_client.post(
+        "/v1/messages",
+        headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+        json=_make_body(tools=[{
+            "name": "bash",
+            "description": "Run bash",
+            "input_schema": {"type": "object", "properties": {"command": {"type": "string"}}},
+        }]),
+    )
+    sent_body = mock_proxy.call_args[1]["json_data"]
+    assert sent_body["tools"][0]["type"] == "function"
+    assert sent_body["tools"][0]["function"]["name"] == "bash"
+    assert "parameters" in sent_body["tools"][0]["function"]
+
+
+@patch("main.proxy_request", new_callable=AsyncMock)
+def test_messages_tool_call_response_transformed_to_anthropic(mock_proxy, test_client):
+    mock_proxy.return_value.status_code = 200
+    mock_proxy.return_value.json = lambda: {
+        "model": "llama3",
+        "message": {
+            "role": "assistant",
+            "content": "",
+            "tool_calls": [{"function": {"name": "bash", "arguments": {"command": "ls"}}}],
+        },
+        "prompt_eval_count": 10,
+        "eval_count": 5,
+        "done": True,
+    }
+    response = test_client.post(
+        "/v1/messages",
+        headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+        json=_make_body(),
+    )
+    data = response.json()
+    assert data["stop_reason"] == "tool_use"
+    tool_block = next(b for b in data["content"] if b["type"] == "tool_use")
+    assert tool_block["name"] == "bash"
+    assert tool_block["input"] == {"command": "ls"}
+
+
+# --- Streaming ---
+
+@patch("main.proxy_request", new_callable=AsyncMock)
+def test_messages_streaming_returns_anthropic_sse_events(mock_proxy, test_client):
+    mock_proxy.return_value.status_code = 200
+    mock_proxy.return_value.json = lambda: {
+        "model": "llama3",
+        "message": {"role": "assistant", "content": "Hi!"},
+        "prompt_eval_count": 5,
+        "eval_count": 3,
+        "done": True,
+    }
+
+    response = test_client.post(
+        "/v1/messages",
+        headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+        json=_make_body(stream=True),
+    )
+
+    assert response.status_code == 200
+    events = [
+        json.loads(line[6:])
+        for line in response.text.splitlines()
+        if line.startswith("data: ")
+    ]
+    event_types = [e["type"] for e in events]
+    assert "message_start" in event_types
+    assert "content_block_start" in event_types
+    assert "content_block_delta" in event_types
+    assert "message_stop" in event_types
+
+    deltas = [e for e in events if e["type"] == "content_block_delta"]
+    text = "".join(d["delta"]["text"] for d in deltas)
+    assert text == "Hi!"
+
+
+# --- Backend-Auth (BACKEND_API_KEY) ---
+
+def test_proxy_request_forwards_backend_api_key(test_client):
+    with patch("main.httpx.AsyncClient") as mock_cls:
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {"result": "ok"}
+
+        mock_instance = AsyncMock()
+        mock_instance.__aenter__ = AsyncMock(return_value=mock_instance)
+        mock_instance.__aexit__ = AsyncMock(return_value=False)
+        mock_instance.request = AsyncMock(return_value=mock_response)
+        mock_cls.return_value = mock_instance
+
+        with patch.dict(os.environ, {"BACKEND_API_KEY": "sk-backend-secret"}):
+            test_client.post(
+                "/api/generate",
+                headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+                json={"model": "llama3", "prompt": "hi"},
+            )
+
+        _, kwargs = mock_instance.request.call_args
+        assert kwargs.get("headers", {}).get("Authorization") == "Bearer sk-backend-secret"
+
+
+def test_proxy_request_omits_auth_header_when_no_backend_key(test_client):
+    with patch("main.httpx.AsyncClient") as mock_cls:
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {"result": "ok"}
+
+        mock_instance = AsyncMock()
+        mock_instance.__aenter__ = AsyncMock(return_value=mock_instance)
+        mock_instance.__aexit__ = AsyncMock(return_value=False)
+        mock_instance.request = AsyncMock(return_value=mock_response)
+        mock_cls.return_value = mock_instance
+
+        env_without_key = {k: v for k, v in os.environ.items() if k != "BACKEND_API_KEY"}
+        with patch.dict(os.environ, env_without_key, clear=True):
+            test_client.post(
+                "/api/generate",
+                headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+                json={"model": "llama3", "prompt": "hi"},
+            )
+
+        _, kwargs = mock_instance.request.call_args
+        assert "Authorization" not in kwargs.get("headers", {})
--- a/build_push.sh
+++ b/build_push.sh
@ -37,6 +37,7 @@ echo ""
 docker buildx build \
    --platform "$PLATFORM" \
    --push \
+    --build-arg APP_VERSION="$VERSION" \
    -t "$IMAGE:$VERSION" \
    -t "$IMAGE:latest" \
    .
--- a/frontend/src/main.jsx
+++ b/frontend/src/main.jsx
@ -76,14 +76,17 @@ const EMPTY_KEY_FORM = {
  name: '', expires_at: '', daily_tokens: '', monthly_tokens: '', daily_requests: '', monthly_requests: '',
 };

-function SettingsSection({ password }) {
+function SettingsSection({ password, refreshKey }) {
  const [settings, setSettings] = useState(null);
  const [availableModels, setAvailableModels] = useState([]);
  const [modelsLoading, setModelsLoading] = useState(false);
  const [ollamaReachable, setOllamaReachable] = useState(true);
  const [proxyEndpoint, setProxyEndpoint] = useState(null);
+  const [appVersion, setAppVersion] = useState(null);
  const [saved, setSaved] = useState(false);
  const [error, setError] = useState(null);
+  const [usageLog, setUsageLog] = useState([]);
+  const [errorLog, setErrorLog] = useState([]);

  const fetchModels = async (url, currentModel) => {
    setModelsLoading(true);
@ -95,8 +98,8 @@ function SettingsSection({ password }) {
      const { models, reachable } = res.data;
      setOllamaReachable(reachable);
      setAvailableModels(models);
-      if (models.length > 0 && !models.includes(currentModel)) {
-        setSettings(s => ({ ...s, default_model: models[0] }));
+      if (models.length > 0 && currentModel && !models.includes(currentModel)) {
+        setSettings(s => ({ ...s, force_model: models[0] }));
      }
    } catch {
      setOllamaReachable(false);
@ -108,16 +111,25 @@ function SettingsSection({ password }) {

  useEffect(() => {
    const headers = authHeaders(password);
-    Promise.all([
+    Promise.allSettled([
      axios.get('/api/settings', { headers }),
      axios.get('/api/proxy-info', { headers }),
-    ]).then(([settingsRes, proxyRes]) => {
-      const s = settingsRes.data;
+      axios.get('/api/logs/usage', { headers }),
+      axios.get('/api/logs/error', { headers }),
+    ]).then(([settingsRes, proxyRes, usageRes, errorRes]) => {
+      if (settingsRes.status === 'rejected' || proxyRes.status === 'rejected') {
+        setError('Einstellungen konnten nicht geladen werden.');
+        return;
+      }
+      const s = settingsRes.value.data;
      setSettings(s);
-      setProxyEndpoint(proxyRes.data.endpoint);
-      fetchModels(s.ollama_url, s.default_model);
-    }).catch(() => setError('Einstellungen konnten nicht geladen werden.'));
-  }, []);
+      setProxyEndpoint(proxyRes.value.data.endpoint);
+      setAppVersion(proxyRes.value.data.version);
+      if (usageRes.status === 'fulfilled') setUsageLog(usageRes.value.data.lines);
+      if (errorRes.status === 'fulfilled') setErrorLog(errorRes.value.data.lines);
+      fetchModels(s.ollama_url, s.force_model);
+    });
+  }, [refreshKey]);

  const handleSave = async (e) => {
    e.preventDefault();
@ -145,6 +157,10 @@ function SettingsSection({ password }) {
            <small> (Änderung erfordert Neustart)</small>
          </span>
        </div>
+        <div className="settings-row">
+          <label>Version</label>
+          <span className="settings-value">{appVersion ?? '…'}</span>
+        </div>
        <div className="settings-row">
          <label>Ollama-Endpunkt</label>
          <div className="settings-input-wrap">
@ -152,7 +168,7 @@ function SettingsSection({ password }) {
              type="url"
              value={settings.ollama_url}
              onChange={(e) => setSettings({ ...settings, ollama_url: e.target.value })}
-              onBlur={(e) => fetchModels(e.target.value, settings.default_model)}
+              onBlur={(e) => fetchModels(e.target.value, settings.force_model)}
              placeholder="http://localhost:11434"
              required
            />
@ -162,30 +178,40 @@ function SettingsSection({ password }) {
          </div>
        </div>
        <div className="settings-row">
-          <label>Standard-Modell</label>
+          <label>Aktives Modell (Lock)</label>
          {modelsLoading ? (
            <span className="settings-value">Lade Modelle…</span>
          ) : availableModels.length > 0 ? (
            <select
-              value={settings.default_model}
-              onChange={(e) => setSettings({ ...settings, default_model: e.target.value })}
+              value={settings.force_model || ""}
+              onChange={(e) => setSettings({ ...settings, force_model: e.target.value || null })}
            >
+              <option value="">— kein Lock —</option>
              {availableModels.map(m => <option key={m} value={m}>{m}</option>)}
            </select>
          ) : (
            <input
              type="text"
-              value={settings.default_model}
-              onChange={(e) => setSettings({ ...settings, default_model: e.target.value })}
-              placeholder="llama3"
-              required
+              value={settings.force_model || ""}
+              onChange={(e) => setSettings({ ...settings, force_model: e.target.value || null })}
+              placeholder="leer = kein Lock"
            />
          )}
        </div>
        {error && <div className="error">{error}</div>}
        {saved && <div className="success">Gespeichert.</div>}
-        <button type="submit">Speichern</button>
+        <button type="submit">Änderungen übernehmen</button>
      </form>
+      <div className="log-section">
+        <h3>Nutzungslog (letzte 10 Einträge)</h3>
+        <pre className="log-pre">{usageLog.length > 0 ? usageLog.join('\n') : '— keine Einträge —'}</pre>
+        {errorLog.length > 0 && (
+          <>
+            <h3>Fehlerlog (letzte 10 Einträge)</h3>
+            <pre className="log-pre log-pre-error">{errorLog.join('\n')}</pre>
+          </>
+        )}
+      </div>
    </section>
  );
 }
@ -200,21 +226,31 @@ function App() {
  const [creating, setCreating] = useState(false);
  const [editKey, setEditKey] = useState(null);
  const [editForm, setEditForm] = useState({});
-
-  useEffect(() => {
-    if (!password) { setLoading(false); return; }
-    fetchApiKeys().finally(() => setLoading(false));
-  }, [password]);
+  const [refreshKey, setRefreshKey] = useState(0);
+  const [lastUpdated, setLastUpdated] = useState(null);

  const fetchApiKeys = async () => {
    try {
      const res = await axios.get('/api/api-keys', { headers: authHeaders(password) });
      setApiKeys(res.data);
+      setLastUpdated(new Date());
    } catch {
      setError('API-Keys konnten nicht geladen werden.');
    }
  };

+  useEffect(() => {
+    if (!password) { setLoading(false); return; }
+    fetchApiKeys().finally(() => setLoading(false));
+
+    const timer = setInterval(() => {
+      fetchApiKeys();
+      setRefreshKey(k => k + 1);
+    }, 5 * 60 * 1000);
+
+    return () => clearInterval(timer);
+  }, [password]);
+
  const handleCreate = async (e) => {
    e.preventDefault();
    setCreating(true);
@ -295,6 +331,7 @@ function App() {

  const logout = () => {
    sessionStorage.removeItem('admin_password');
+    setLastUpdated(null);
    setPassword(null);
  };

@ -306,10 +343,17 @@ function App() {
    <div className="container">
      <div className="header">
        <h1>Ollama Proxy Admin</h1>
+        <div className="header-right">
+          {lastUpdated && (
+            <span className="last-updated">
+              Aktualisiert: {lastUpdated.toLocaleTimeString('de-DE', { hour: '2-digit', minute: '2-digit' })}
+            </span>
+          )}
          <button onClick={logout}>Abmelden</button>
        </div>
+      </div>

-      <SettingsSection password={password} />
+      <SettingsSection password={password} refreshKey={refreshKey} />

      <section>
        <h2>Neuer API-Key</h2>
--- a/frontend/src/styles.css
+++ b/frontend/src/styles.css
@ -182,6 +182,7 @@ tr:hover {
 .settings-row label {
  width: 160px;
  flex-shrink: 0;
+  font-size: 14px;
  font-weight: 500;
  color: #2c3e50;
 }
@ -408,7 +409,7 @@ tr:hover {
 .edit-form label small {
  font-weight: 400;
  color: #999;
-  font-size: 11px;
+  font-size: 12px;
 }

 .edit-form input {
@ -452,3 +453,46 @@ tr:hover {
 .btn-cancel:hover {
  background: #7f8c8d;
 }
+
+.log-section {
+  margin-top: 24px;
+  border-top: 1px solid #eee;
+  padding-top: 16px;
+}
+
+.log-section h3 {
+  font-size: 14px;
+  font-weight: 600;
+  color: #34495e;
+  margin: 0 0 6px;
+}
+
+.log-pre {
+  background: #1e2a35;
+  color: #c8d6df;
+  font-family: 'Menlo', 'Consolas', monospace;
+  font-size: 11px;
+  line-height: 1.6;
+  padding: 10px 14px;
+  border-radius: 4px;
+  margin: 0 0 14px;
+  overflow-x: auto;
+  white-space: pre;
+}
+
+.log-pre-error {
+  background: #2d1b1b;
+  color: #f5a0a0;
+  margin-bottom: 0;
+}
+
+.header-right {
+  display: flex;
+  align-items: center;
+  gap: 16px;
+}
+
+.last-updated {
+  font-size: 12px;
+  color: #95a5a6;
+}
--- a/frontend/vite.config.js
+++ b/frontend/vite.config.js
@ -11,6 +11,7 @@ export default defineConfig({
      '/api/settings': 'http://localhost:8001',
      '/api/ollama-models': 'http://localhost:8001',
      '/api/proxy-info': 'http://localhost:8001',
+      '/api/logs': 'http://localhost:8001',
      '/api': 'http://localhost:8000',
    },
  },
--- a/run_tests.py
+++ b/run_tests.py
@ -1,8 +0,0 @@
-#!/usr/bin/env python3
-"""Pytest runner for Ollama Proxy tests."""
-import subprocess
-import sys
-
-if __name__ == "__main__":
-    result = subprocess.run([sys.executable, "-m", "pytest"] + sys.argv[1:], cwd="backend")
-    sys.exit(result.returncode)
--- a/start.sh
+++ b/start.sh
@ -1,17 +1,19 @@
 #!/bin/bash

+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+
 # .env laden
-if [ -f .env ]; then
+if [ -f "$SCRIPT_DIR/.env" ]; then
    set -a
-    source .env
+    source "$SCRIPT_DIR/.env"
    set +a
 fi

 # Virtuelle Umgebung aktivieren falls vorhanden
-if [ -f .venv/bin/activate ]; then
-    source .venv/bin/activate
-elif [ -f venv/bin/activate ]; then
-    source venv/bin/activate
+if [ -f "$SCRIPT_DIR/.venv/bin/activate" ]; then
+    source "$SCRIPT_DIR/.venv/bin/activate"
+elif [ -f "$SCRIPT_DIR/venv/bin/activate" ]; then
+    source "$SCRIPT_DIR/venv/bin/activate"
 fi

 if [ -z "$ADMIN_PASSWORD" ]; then
--- a/start_claude.sh
+++ b/start_claude.sh
@ -0,0 +1,33 @@
+#!/bin/bash
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+
+# .env laden
+if [ -f "$SCRIPT_DIR/.env" ]; then
+    set -a
+    source "$SCRIPT_DIR/.env"
+    set +a
+fi
+
+# API-Key: erstes Argument hat Vorrang, sonst Umgebungsvariable PROXY_API_KEY
+API_KEY="${1:-$PROXY_API_KEY}"
+
+if [ -z "$API_KEY" ]; then
+    echo "Fehler: Kein API-Key angegeben."
+    echo "Verwendung: ./start_claude.sh sk-dein-key"
+    echo "       oder: PROXY_API_KEY=sk-dein-key ./start_claude.sh"
+    exit 1
+fi
+
+# 0.0.0.0 ist eine Bind-Adresse, kein gültiger Client-Host
+PROXY_HOST="${PROXY_HOST:-0.0.0.0}"
+PROXY_PORT="${PROXY_PORT:-8000}"
+if [ "$PROXY_HOST" = "0.0.0.0" ]; then
+    PROXY_HOST="localhost"
+fi
+
+export ANTHROPIC_BASE_URL="http://${PROXY_HOST}:${PROXY_PORT}"
+export ANTHROPIC_AUTH_TOKEN="$API_KEY"
+
+echo "Verbinde mit Proxy: $ANTHROPIC_BASE_URL"
+exec claude
--- a/test_api.sh
+++ b/test_api.sh
@ -1,7 +0,0 @@
-curl -X POST http://localhost:8000/api/generate \
-  -H "Authorization: sk-admin-key" \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "llama3",
-    "prompt": "Test"
-  }'
Author	SHA1	Message	Date
Oliver Hofmann	92ed7368eb	Remove redundant run_tests.py wrapper	2026-05-10 11:51:13 +02:00
Oliver Hofmann	5a50d0be04	Remove obsolete test_api.sh	2026-05-10 11:47:15 +02:00
Oliver Hofmann	21cab46365	Remove docs/ from tracking, add to gitignore	2026-05-10 11:44:10 +02:00
Oliver Hofmann	fcaea9e3a9	Add license section to DockerHub descriptions	2026-05-10 11:21:39 +02:00
Oliver Hofmann	eb83c52b7f	Rename settings save button to 'Änderungen übernehmen'	2026-05-10 10:50:09 +02:00
Oliver Hofmann	f551b2a421	Harmonize typography: remove log uppercase, normalize label font sizes	2026-05-10 10:45:41 +02:00
Oliver Hofmann	79a30dd179	Route /api/logs to admin API in Vite proxy config	2026-05-10 10:40:41 +02:00
Oliver Hofmann	0353e0299f	Use Promise.allSettled so log fetch failures don't break settings display	2026-05-10 10:34:28 +02:00
Oliver Hofmann	5a94fc6d90	Reset lastUpdated on logout	2026-05-10 10:22:33 +02:00
Oliver Hofmann	cf1b3f7786	Add 5-minute auto-reload and last-updated timestamp to admin UI	2026-05-10 10:20:03 +02:00
Oliver Hofmann	02b4ad06ca	Fix pre whitespace, log-pre-error margin, error log heading	2026-05-10 10:18:27 +02:00
Oliver Hofmann	ca55783b90	Show last 10 log lines in settings section	2026-05-10 10:15:48 +02:00
Oliver Hofmann	a9b0168c71	Add GET /api/logs/{name} endpoint to admin API	2026-05-10 10:11:55 +02:00
Oliver Hofmann	7ce4d3a895	Add implementation plan: log viewer and auto-reload	2026-05-10 10:09:34 +02:00
Oliver Hofmann	fff9d1048d	Add design spec: log viewer and auto-reload for admin UI	2026-05-10 10:07:44 +02:00
Oliver Hofmann	cdd55880d6	Remove unnecessary bold formatting from Anthropic API feature entries	2026-05-10 10:00:34 +02:00
Oliver Hofmann	6b2ae4b072	Remove BACKEND_API_KEY from public documentation	2026-05-10 09:58:50 +02:00
Oliver Hofmann	4c8a8d4afb	Update Kurzanleitung: add gemma4:e4b, Claude Code section	2026-05-10 09:56:35 +02:00
Oliver Hofmann	9872175fb0	Add LICENSE, update docs with Anthropic endpoint and free-claude-code attribution	2026-05-10 09:53:12 +02:00
Oliver Hofmann	cc3ee5a03c	Add Anthropic Messages API compatibility layer (/v1/messages) - POST /v1/messages endpoint with full quota enforcement and auth - Accepts x-api-key and anthropic-auth-token headers (for Claude Code) - Transforms Anthropic request/response format ↔ Ollama /api/chat - Streaming support via Anthropic SSE format - Tool use support (request and response transformation) - ANTHROPIC_DEFAULT_MODEL env var for model selection without admin UI - BACKEND_API_KEY env var for forwarding auth to upstream proxies - Fix SQLite path always resolved relative to database.py location - start.sh and start_claude.sh load .env relative to script location	2026-05-10 09:45:38 +02:00
Oliver Hofmann	70fd61608b	Log actual tokens and elapsed time for all endpoints incl. streaming For streaming /v1/chat/completions: inject stream_options.include_usage, parse usage from SSE chunks, log actual ↑↓ tokens and wall time in the generator's finally block. Add elapsed time to all second log entries.	2026-05-08 09:47:32 +02:00
Oliver Hofmann	07f6fec4bf	Show app version in admin UI and /version endpoint Embed APP_VERSION build arg in Docker image (default: dev). build_push.sh passes the git tag as build arg. Proxy exposes GET /version, admin UI shows it as read-only field in settings.	2026-05-08 09:30:23 +02:00
Oliver Hofmann	6761a73364	Add /api/ps example to Kurzanleitung	2026-05-08 09:25:44 +02:00
Oliver Hofmann	0d1ce96c99	Expose /api/ps to show currently loaded model	2026-05-08 09:22:17 +02:00
Oliver Hofmann	b16b3af44d	Mention VPN as alias for Intranet in Kurzanleitung	2026-05-08 09:17:08 +02:00
Oliver Hofmann	fdbe0a74e8	Exclude .tex and .pdf from Docker build context	2026-05-08 08:41:49 +02:00
Oliver Hofmann	0154c89c6b	Improve Python examples and opencode description in Kurzanleitung Split second example into focused model listing only, remove repeated prompt code. Replace TUI with 'interaktive Terminal-Oberfläche'.	2026-05-08 08:40:25 +02:00
Oliver Hofmann	e3dbed9f5e	Ignore generated KURZANLEITUNG.tex and .pdf	2026-05-08 08:15:07 +02:00
Oliver Hofmann	5b37718120	Fix list indentation in KURZANLEITUNG.md	2026-05-08 08:07:02 +02:00
Oliver Hofmann	f823e7d314	Return 422 when model field is missing and no force_model is set Prevents an opaque Ollama error from reaching the client by failing fast with a clear message before the request is forwarded.	2026-05-08 08:04:52 +02:00
Oliver Hofmann	34b108f4df	Replace default_model with force_model (model lock) Removes DEFAULT_MODEL in favour of a force_model setting configurable via the admin UI. When set, every proxy request's model field is overridden, preventing uncoordinated model switches during lab sessions. Updates schemas, admin API, all three proxy endpoints, frontend, init_db, and docs (README, DOCKERHUB, KURZANLEITUNG).	2026-05-08 08:02:16 +02:00
Oliver Hofmann	cced65693c	Log actual Ollama token counts and add user guide Add a second usage log line after each proxy response with actual ↑prompt ↓completion token counts from Ollama (prompt_eval_count/eval_count for native endpoints, usage object for OpenAI endpoint). Also adds KURZANLEITUNG.md for students and colleagues covering API access, model selection, Python examples, opencode setup, and quota/admin information.	2026-05-08 07:21:36 +02:00