Remove redundant run_tests.py wrapper

Remove obsolete test_api.sh
Remove docs/ from tracking, add to gitignore
2026-05-10 11:51:13 +02:00 · 2026-05-10 11:47:15 +02:00 · 2026-05-10 11:44:10 +02:00 · 2026-05-10 11:21:39 +02:00 · 2026-05-10 10:50:09 +02:00 · 2026-05-10 10:45:41 +02:00
21 changed files with 926 additions and 66 deletions
--- a/.env.example
+++ b/.env.example
@ -16,3 +16,8 @@ DATABASE_URL=sqlite:///./test.db
 OLLAMA_URL=http://localhost:11434
 DEFAULT_MODEL=llama3
 APP_TZ=Europe/Berlin
+
+# Standard-Modell für den Anthropic-kompatiblen Endpunkt (/v1/messages)
+# Wird verwendet, wenn der Client kein Modell angibt oder ein Anthropic-Modellname
+# (z.B. claude-opus-4-7) auf kein lokales Modell passt.
+ANTHROPIC_DEFAULT_MODEL=llama3
--- a/.gitignore
+++ b/.gitignore
@ -30,4 +30,7 @@ config.json

 # Generated documents
 KURZANLEITUNG.tex
-KURZANLEITUNG.pdf
+KURZANLEITUNG.pdf
+
+# Internal planning docs
+docs/
--- a/DOCKERHUB.en.md
+++ b/DOCKERHUB.en.md
@ -1,10 +1,11 @@
 # mediaeng/llmproxy

-A lightweight reverse proxy for [Ollama](https://ollama.com) that manages API keys with configurable token and request quotas. Incoming requests in OpenAI-compatible format are authenticated, checked against the quota, and forwarded to the configured Ollama server.
+A lightweight reverse proxy for [Ollama](https://ollama.com) that manages API keys with configurable token and request quotas. Incoming requests in OpenAI-compatible or Anthropic-compatible format are authenticated, checked against the quota, and forwarded to the configured Ollama server.

 ## Features

 - OpenAI-compatible endpoint (`/v1/chat/completions`, `/v1/models`)
+- Anthropic Messages API (`/v1/messages`) — compatible with Claude Code CLI and Anthropic SDK clients
 - API key management with daily and monthly token/request limits
 - Web-based admin interface (port 8001)
 - Model lock: enforces a specific model for all requests (useful for courses and lab sessions)
@ -17,7 +18,7 @@ A lightweight reverse proxy for [Ollama](https://ollama.com) that manages API ke

 | Port | Service |
 |------|---------|
-| `8000` | Proxy endpoint (OpenAI API) |
+| `8000` | Proxy endpoint (OpenAI and Anthropic API) |
 | `8001` | Admin API + web interface |

 All API endpoints require the `ADMIN_PASSWORD` — without a valid token, only the public frontend files (HTML/JS/CSS of the login page) are accessible. The password is therefore the primary protection.
@ -35,6 +36,7 @@ All API endpoints require the `ADMIN_PASSWORD` — without a valid token, only t
 | `ADMIN_PORT` | `8001` | Admin API port |
 | `APP_TZ` | `Europe/Berlin` | Timezone for daily/monthly quota resets |
 | `LOG_FILE` | `logs/usage.log` | Path of the rotating usage log file |
+| `ANTHROPIC_DEFAULT_MODEL` | – | Default model for `/v1/messages` (Ollama model name, e.g. `llama3`) |

 ## Docker Compose – Ollama on the Host (Linux, recommended)

@ -60,6 +62,7 @@ volumes:
 ADMIN_PASSWORD=changeme
 OLLAMA_URL=http://localhost:11434
 APP_TZ=Europe/Berlin
+ANTHROPIC_DEFAULT_MODEL=llama3
 ```

 ## Docker Compose – Ollama as Container, SQLite
@ -77,8 +80,8 @@ services:
    environment:
      ADMIN_PASSWORD: changeme
      OLLAMA_URL: http://ollama:11434
-
      APP_TZ: Europe/Berlin
+      ANTHROPIC_DEFAULT_MODEL: llama3
    volumes:
      - llmproxy-data:/app/backend
    depends_on:
@ -110,9 +113,9 @@ services:
    environment:
      ADMIN_PASSWORD: changeme
      OLLAMA_URL: http://ollama:11434
-
      APP_TZ: Europe/Berlin
      DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
+      ANTHROPIC_DEFAULT_MODEL: llama3
    depends_on:
      db:
        condition: service_healthy
@ -147,9 +150,23 @@ volumes:

 ## Client Configuration

-Configure the proxy as an OpenAI-compatible endpoint:
-
+**OpenAI-compatible client:**
 ```
 Base URL:  http://<host>:8000/v1
 API Key:   <API key created in the admin interface>
 ```
+
+**Claude Code CLI:**
+```bash
+ANTHROPIC_BASE_URL=http://<host>:8000 \
+ANTHROPIC_AUTH_TOKEN=<API key created in the admin interface> \
+claude
+```
+
+## Acknowledgements
+
+The Anthropic Messages API endpoint (`/v1/messages`) was inspired by [free-claude-code](https://github.com/Alishahryar1/free-claude-code) by Ali Khokhar, which pursues a similar approach for routing Claude Code requests to alternative LLM backends.
+
+## License
+
+MIT — © 2026 Oliver Hofmann. See [LICENSE](https://git.efi.th-nuernberg.de/gitea/hofmannol/llmproxy/src/branch/main/LICENSE) for details.
--- a/DOCKERHUB.md
+++ b/DOCKERHUB.md
@ -1,10 +1,11 @@
 # mediaeng/llmproxy

-Ein schlanker Reverse-Proxy für [Ollama](https://ollama.com), der API-Keys mit konfigurierbaren Token- und Request-Quoten verwaltet. Eingehende Anfragen im OpenAI-kompatiblen Format werden authentifiziert, auf Quota geprüft und an den konfigurierten Ollama-Server weitergeleitet.
+Ein schlanker Reverse-Proxy für [Ollama](https://ollama.com), der API-Keys mit konfigurierbaren Token- und Request-Quoten verwaltet. Eingehende Anfragen im OpenAI-kompatiblen oder Anthropic-kompatiblen Format werden authentifiziert, auf Quota geprüft und an den konfigurierten Ollama-Server weitergeleitet.

 ## Funktionen

 - OpenAI-kompatibler Endpunkt (`/v1/chat/completions`, `/v1/models`)
+- Anthropic Messages API (`/v1/messages`) — kompatibel mit Claude Code CLI und Anthropic-SDK-Clients
 - API-Key-Verwaltung mit tages- und monatlichen Token-/Request-Limits
 - Web-basierte Admin-Oberfläche (Port 8001)
 - Modell-Lock: erzwingt ein bestimmtes Modell für alle Requests (nützlich für Praktika/Kurse)
@ -17,7 +18,7 @@ Ein schlanker Reverse-Proxy für [Ollama](https://ollama.com), der API-Keys mit

 | Port | Dienst |
 |------|--------|
-| `8000` | Proxy-Endpunkt (OpenAI-API) |
+| `8000` | Proxy-Endpunkt (OpenAI- und Anthropic-API) |
 | `8001` | Admin-API + Web-Oberfläche |

 Alle API-Endpunkte erfordern das `ADMIN_PASSWORD` — ein Zugriff ohne gültiges Token liefert nur die öffentlichen Frontend-Dateien (HTML/JS/CSS der Login-Seite). Das Passwort ist damit die primäre Schutzmaßnahme.
@ -35,6 +36,7 @@ Alle API-Endpunkte erfordern das `ADMIN_PASSWORD` — ein Zugriff ohne gültiges
 | `ADMIN_PORT` | `8001` | Port der Admin-API |
 | `APP_TZ` | `Europe/Berlin` | Zeitzone für Tages-/Monats-Reset der Quoten |
 | `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei |
+| `ANTHROPIC_DEFAULT_MODEL` | – | Standard-Modell für `/v1/messages` (Ollama-Modellname, z. B. `llama3`) |

 ## Docker Compose – Ollama auf dem Host (Linux, empfohlen)

@ -60,6 +62,7 @@ volumes:
 ADMIN_PASSWORD=changeme
 OLLAMA_URL=http://localhost:11434
 APP_TZ=Europe/Berlin
+ANTHROPIC_DEFAULT_MODEL=llama3
 ```

 ## Docker Compose – Ollama als Container, SQLite
@ -77,8 +80,8 @@ services:
    environment:
      ADMIN_PASSWORD: changeme
      OLLAMA_URL: http://ollama:11434
-
      APP_TZ: Europe/Berlin
+      ANTHROPIC_DEFAULT_MODEL: llama3
    volumes:
      - llmproxy-data:/app/backend
    depends_on:
@ -110,9 +113,9 @@ services:
    environment:
      ADMIN_PASSWORD: changeme
      OLLAMA_URL: http://ollama:11434
-
      APP_TZ: Europe/Berlin
      DATABASE_URL: postgresql://llmproxy:secret@db:5432/llmproxy
+      ANTHROPIC_DEFAULT_MODEL: llama3
    depends_on:
      db:
        condition: service_healthy
@ -147,9 +150,19 @@ volumes:

 ## Client-Konfiguration

-Den Proxy als OpenAI-kompatibler Endpunkt konfigurieren:
-
+**OpenAI-kompatibler Client:**
 ```
 Base URL:  http://<host>:8000/v1
 API Key:   <angelegter API-Key aus der Admin-Oberfläche>
 ```
+
+**Claude Code CLI:**
+```bash
+ANTHROPIC_BASE_URL=http://<host>:8000 \
+ANTHROPIC_AUTH_TOKEN=<API-Key> \
+claude
+```
+
+## Lizenz
+
+MIT — © 2026 Oliver Hofmann. Details siehe [LICENSE](https://git.efi.th-nuernberg.de/gitea/hofmannol/llmproxy/src/branch/main/LICENSE).
--- a/2
+++ b/2
@ -6,6 +6,8 @@ COPY frontend/ frontend/
 RUN npm run build --prefix frontend

 FROM python:3.12-slim
+ARG APP_VERSION=dev
+ENV APP_VERSION=$APP_VERSION
 WORKDIR /app

 COPY backend/requirements.txt .
--- a/KURZANLEITUNG.md
+++ b/KURZANLEITUNG.md
@ -14,7 +14,7 @@ Typische Anwendungsfälle:

 ## Zugang

-Der Dienst ist **nur im Intranet** erreichbar.
+Der Dienst ist **nur im Intranet (VPN)** erreichbar.

 | | |
 |---|---|
@ -27,6 +27,7 @@ Der Dienst ist **nur im Intranet** erreichbar.

 | Modell | Größe | Hinweis |
 |---|---|---|
+| `gemma4:e4b` | 9,6 GB | sehr schnell, für einfache Aufgaben |
 | `gemma4:31b` | 19 GB | kompakt, schnell |
 | `gpt-oss:20b` | 13 GB | kompakt, schnell |
 | `gpt-oss:120b` | 65 GB | sehr leistungsfähig |
@ -86,6 +87,24 @@ for m in models.data:

 ---

+## Aktuell geladenes Modell abfragen
+
+Da immer nur ein Modell gleichzeitig im Speicher sein kann, lässt sich mit folgendem Aufruf prüfen, welches Modell gerade aktiv ist:
+
+```python
+import httpx
+
+r = httpx.get(
+    "http://141.75.33.244:8000/api/ps",
+    headers={"Authorization": "Bearer sk-..."}
+)
+print(r.json())
+```
+
+Die Antwort enthält Modellname, Größe und wie lange das Modell noch im Speicher bleibt.
+
+---
+
 ## Empfehlungen zur Nutzung

 - **Kleines Modell zuerst** (`gemma4:31b` oder `gpt-oss:20b`) – viel schneller, für viele Aufgaben ausreichend.
@ -147,6 +166,26 @@ opencode öffnet eine interaktive Terminal-Oberfläche und kann dann im Projektv

 ---

+## Coding-Assistent: Claude Code
+
+[Claude Code](https://claude.ai/code) ist Anthropics offizieller KI-Coding-Agent für das Terminal. Wer bereits einen Claude-Code-Zugang hat, kann ihn über den Intranet-Dienst mit lokalen Modellen betreiben — ohne Daten an Anthropic zu übertragen.
+
+### Voraussetzung
+
+Ein aktiver Claude-Code-Zugang (Claude Pro oder Team).
+
+### Starten
+
+```bash
+ANTHROPIC_BASE_URL=http://141.75.33.244:8000 \
+ANTHROPIC_AUTH_TOKEN=sk-... \
+claude
+```
+
+Das zu verwendende Modell wird vom Admin über `ANTHROPIC_DEFAULT_MODEL` vorkonfiguriert — eine manuelle Modellauswahl ist nicht nötig.
+
+---
+
 ## Administration (nur für Admins)

 Das Web-Interface zur Verwaltung von API-Keys und Quotas ist erreichbar unter:
--- a/27
+++ b/27
@ -0,0 +1,27 @@
+MIT License
+
+Copyright (c) 2026 Oliver Hofmann
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+---
+
+Portions of this software were inspired by free-claude-code
+(https://github.com/Alishahryar1/free-claude-code),
+copyright (c) 2026 Ali Khokhar, MIT License.
--- a/README.md
+++ b/README.md
@ -4,12 +4,13 @@ Ollama bietet von sich aus keine Authentifizierung — wer die API erreicht, kan

 ## Features

- API-Key-Authentifizierung (Bearer Token oder `sk-`-Prefix)
+- API-Key-Authentifizierung (Bearer Token, `sk-`-Prefix, `x-api-key`- und `anthropic-auth-token`-Header)
 - Optionales Ablaufdatum pro API-Key
 - Quota-Management mit getrennten Tages- und Monatslimits (Tokens & Requests)
- Token-Zählung via tiktoken, Reset-Grenzen in der Zeitzone Europe/Berlin
+- Token-Zählung via tiktoken, Reset-Grenzen in der konfigurierten Zeitzone
 - Web-Admin-Oberfläche (API-Keys verwalten, Ollama-Einstellungen, Verbrauchsanzeige)
 - OpenAI-kompatibler `/v1/chat/completions`-Endpunkt mit Streaming und Tool-Use
+- Anthropic Messages API `/v1/messages` — kompatibel mit Claude Code CLI und Anthropic-SDK-Clients
 - Rotierende Nutzungs-Logs
 - SQLite (Standard) oder PostgreSQL
 - Docker-Image auf DockerHub: `mediaeng/llmproxy`
@ -35,6 +36,7 @@ DATABASE_URL=sqlite:///./test.db
 OLLAMA_URL=http://localhost:11434
 APP_TZ=Europe/Berlin
 LOG_FILE=logs/usage.log
+ANTHROPIC_DEFAULT_MODEL=llama3
 ```

 | Variable | Standard | Beschreibung |
@ -49,6 +51,7 @@ LOG_FILE=logs/usage.log
 | `APP_TZ` | `Europe/Berlin` | Zeitzone für tägliche/monatliche Quota-Resets |
 | `LOG_FILE` | `logs/usage.log` | Pfad der rotierenden Nutzungs-Logdatei |
 | `ALLOWED_ORIGINS` | `http://localhost:5173` | CORS-Origins (nur für Entwicklung relevant) |
+| `ANTHROPIC_DEFAULT_MODEL` | — | Standard-Modell für `/v1/messages` (Ollama-Modellname) |

 ## Entwicklung (lokal)

@ -78,6 +81,23 @@ Das Script prüft alle Ports auf Belegung, initialisiert die Datenbank und start

 Admin-Oberfläche: `http://localhost:5173`

+## Claude Code CLI
+
+Der Proxy stellt einen Anthropic-kompatiblen Endpunkt bereit, über den Claude Code CLI mit lokalen Ollama-Modellen genutzt werden kann.
+
+```bash
+# ANTHROPIC_DEFAULT_MODEL in .env setzen, dann:
+./start_claude.sh
+
+# Oder mit Key als Argument:
+./start_claude.sh sk-dein-api-key
+
+# Oder als Umgebungsvariable:
+PROXY_API_KEY=sk-dein-api-key ./start_claude.sh
+```
+
+Das Script setzt `ANTHROPIC_BASE_URL` und `ANTHROPIC_AUTH_TOKEN` automatisch aus der `.env` und startet `claude`.
+
 ## Produktion (Docker)

 ### Docker Compose (empfohlen)
@ -168,17 +188,26 @@ Clients konfigurieren dann `https://llm.example.com/v1` als Base URL.

 ## Proxy-Endpunkte (Port 8000)

-Alle Endpunkte erfordern einen gültigen API-Key im `Authorization`-Header.
+Alle Endpunkte erfordern einen gültigen API-Key im `Authorization`-Header (`Bearer sk-...`), im `x-api-key`-Header oder im `anthropic-auth-token`-Header.

 ```bash
+# OpenAI-kompatibler Endpunkt
 curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer sk-xxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3","messages":[{"role":"user","content":"Hallo"}]}'
+
+# Anthropic-kompatibler Endpunkt (z. B. für Claude Code)
+curl -X POST http://localhost:8000/v1/messages \
+  -H "x-api-key: sk-xxxxxx" \
+  -H "anthropic-version: 2023-06-01" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"llama3","messages":[{"role":"user","content":"Hallo"}],"max_tokens":1024}'
 ```

 | Endpunkt | Methode | Beschreibung |
 |----------|---------|--------------|
+| `/v1/messages` | POST | Chat (Anthropic-Format, Streaming + Tool-Use) |
 | `/v1/chat/completions` | POST | Chat (OpenAI-Format, Streaming + Tool-Use) |
 | `/v1/models` | GET | Modelle (OpenAI-Format) |
 | `/api/generate` | POST | Ollama generate (nativ) |
@ -226,7 +255,8 @@ llm_quota/
 │   └── tests/
 │       ├── conftest.py
 │       ├── test_auth.py
-│       └── test_quota.py
+│       ├── test_quota.py
+│       └── test_anthropic_messages.py
 ├── frontend/
 │   └── src/
 │       ├── main.jsx         # React-Admin-UI
@ -238,13 +268,19 @@ llm_quota/
 ├── docker-entrypoint.sh
 ├── .dockerignore
 ├── start.sh                 # Entwicklungs-Startscript
+├── start_claude.sh          # Claude Code CLI mit Proxy starten
 ├── run_dev.py               # Entwicklungs-Runner für PyCharm
 ├── build_push.sh            # Docker-Build & Push zu DockerHub
+├── LICENSE
 ├── DOCKERHUB.md             # DockerHub-Beschreibung (deutsch)
 ├── DOCKERHUB.en.md          # DockerHub-Beschreibung (englisch)
 └── .gitignore
 ```

+## Danksagung
+
+Der Anthropic-kompatible Endpunkt (`/v1/messages`) wurde durch das Projekt [free-claude-code](https://github.com/Alishahryar1/free-claude-code) von Ali Khokhar inspiriert, das einen ähnlichen Ansatz für das Weiterleiten von Claude-Code-Anfragen an alternative LLM-Backends verfolgt.
+
 ## Lizenz

-MIT
+MIT — siehe [LICENSE](LICENSE)
--- a/backend/admin.py
+++ b/backend/admin.py
@ -131,7 +131,10 @@ async def get_proxy_info(_ = Depends(require_admin_auth)):
    host = os.getenv("PROXY_HOST", "0.0.0.0")
    port = os.getenv("PROXY_PORT", "8000")
    display_host = "localhost" if host in ("0.0.0.0", "::") else host
-    return {"endpoint": f"http://{display_host}:{port}"}
+    return {
+        "endpoint": f"http://{display_host}:{port}",
+        "version": os.getenv("APP_VERSION", "dev"),
+    }

@app.get("/api/settings", response_model=schemas.Settings)
 async def read_settings(db: Session = Depends(get_db), _ = Depends(require_admin_auth)):
@ -166,6 +169,18 @@ async def get_ollama_models(
    except Exception:
        return {"models": [], "reachable": False}

+@app.get("/api/logs/{name}")
+async def get_log_lines(name: str, _ = Depends(require_admin_auth)):
+    if name not in ("usage", "error"):
+        raise HTTPException(status_code=400, detail="name must be 'usage' or 'error'")
+    log_file = Path(os.getenv("LOG_FILE", "logs/usage.log"))
+    path = log_file if name == "usage" else log_file.parent / "error.log"
+    try:
+        lines = path.read_text(encoding="utf-8").splitlines()
+        return {"lines": lines[-10:]}
+    except FileNotFoundError:
+        return {"lines": []}
+
 # Statisches Frontend ausliefern (nur im Produktivbetrieb, wenn dist/ existiert)
 _dist = Path(__file__).parent.parent / "frontend" / "dist"
 if _dist.exists():
--- a/backend/database.py
+++ b/backend/database.py
@ -1,12 +1,20 @@
 import os
+from pathlib import Path
 from dotenv import load_dotenv
 from sqlalchemy import create_engine

-load_dotenv(dotenv_path=os.path.join(os.path.dirname(__file__), '..', '.env'))
+load_dotenv(dotenv_path=Path(__file__).resolve().parent.parent / ".env")
 from sqlalchemy.orm import sessionmaker, declarative_base

 DATABASE_URL = os.environ.get("DATABASE_URL", "sqlite:///./test.db")

+# Relative SQLite-Pfade immer relativ zu dieser Datei auflösen, nicht zum cwd
+if DATABASE_URL.startswith("sqlite:///") and not DATABASE_URL.startswith("sqlite:////"):
+    db_path = DATABASE_URL[len("sqlite:///"):]
+    if not os.path.isabs(db_path):
+        db_path = str(Path(__file__).resolve().parent / db_path)
+        DATABASE_URL = f"sqlite:///{db_path}"
+
 if "sqlite" in DATABASE_URL:
    engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
 else:
--- a/backend/main.py
+++ b/backend/main.py
@ -1,5 +1,8 @@
+import json
 import logging
 import os
+import secrets
+import time
 from logging.handlers import RotatingFileHandler
 from pathlib import Path

@ -49,10 +52,16 @@ def _last_user_msg(messages: list, max_len: int = 120) -> str:

 async def require_api_key(request: Request, db: Session = Depends(get_db)):
    auth_header = request.headers.get("Authorization", "")
+    x_api_key = request.headers.get("x-api-key", "")
+    auth_token = request.headers.get("anthropic-auth-token", "")
    if auth_header.startswith("Bearer "):
        api_key = auth_header[7:]
    elif auth_header.startswith("sk-"):
        api_key = auth_header
+    elif x_api_key:
+        api_key = x_api_key
+    elif auth_token:
+        api_key = auth_token
    else:
        raise HTTPException(status_code=401, detail="Invalid or missing API key")
    db_key = crud.verify_api_key(db, api_key)
@ -80,9 +89,14 @@ async def unhandled_exception_handler(request: Request, exc: Exception):
                    request.method, request.url.path, type(exc).__name__, exc, exc_info=exc)
    return JSONResponse(status_code=500, content={"error": {"message": "Internal server error", "type": "server_error"}})

+def _backend_headers() -> dict:
+    key = os.getenv("BACKEND_API_KEY")
+    return {"Authorization": f"Bearer {key}"} if key else {}
+
+
 async def proxy_request(url: str, method: str = "GET", json_data: dict = None):
    async with httpx.AsyncClient(timeout=300.0) as client:
-        response = await client.request(method=method, url=url, json=json_data)
+        response = await client.request(method=method, url=url, json=json_data, headers=_backend_headers())
        return response

@app.post("/api/generate")
@ -102,12 +116,14 @@ async def generate(request: Request, db: Session = Depends(get_db)):
    prompt_preview = (body.get("prompt", "").replace("\n", " ").strip())[:120]
    usage_log.info('%s | /api/generate | %s | ~%d tokens | "%s"',
                   request.state.api_key_name, body.get("model", "?"), prompt_tokens, prompt_preview)
+    start = time.monotonic()
    try:
        response = await proxy_request(f"{ollama_url}/api/generate", method="POST", json_data=body)
        resp_json = response.json()
-        usage_log.info('%s | /api/generate | %s | actual ↑%d ↓%d tokens',
+        usage_log.info('%s | /api/generate | %s | actual ↑%d ↓%d tokens | %.1fs',
                       request.state.api_key_name, body.get("model", "?"),
-                       resp_json.get("prompt_eval_count", 0), resp_json.get("eval_count", 0))
+                       resp_json.get("prompt_eval_count", 0), resp_json.get("eval_count", 0),
+                       time.monotonic() - start)
        return JSONResponse(content=resp_json, status_code=response.status_code)
    except Exception as exc:
        error_log.error("Proxy error | %s | /api/generate | %s | %s: %s",
@ -131,12 +147,14 @@ async def chat(request: Request, db: Session = Depends(get_db)):

    usage_log.info('%s | /api/chat | %s | ~%d tokens | "%s"',
                   request.state.api_key_name, body.get("model", "?"), prompt_tokens, _last_user_msg(messages))
+    start = time.monotonic()
    try:
        response = await proxy_request(f"{ollama_url}/api/chat", method="POST", json_data=body)
        resp_json = response.json()
-        usage_log.info('%s | /api/chat | %s | actual ↑%d ↓%d tokens',
+        usage_log.info('%s | /api/chat | %s | actual ↑%d ↓%d tokens | %.1fs',
                       request.state.api_key_name, body.get("model", "?"),
-                       resp_json.get("prompt_eval_count", 0), resp_json.get("eval_count", 0))
+                       resp_json.get("prompt_eval_count", 0), resp_json.get("eval_count", 0),
+                       time.monotonic() - start)
        return JSONResponse(content=resp_json, status_code=response.status_code)
    except Exception as exc:
        error_log.error("Proxy error | %s | /api/chat | %s | %s: %s",
@ -149,12 +167,226 @@ async def list_models(db: Session = Depends(get_db)):
    response = await proxy_request(f"{ollama_url}/api/tags", method="GET")
    return JSONResponse(content=response.json(), status_code=response.status_code)

+@app.get("/version")
+async def version():
+    return {"version": os.getenv("APP_VERSION", "dev")}
+
+@app.get("/api/ps")
+async def running_models(db: Session = Depends(get_db)):
+    ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
+    response = await proxy_request(f"{ollama_url}/api/ps", method="GET")
+    return JSONResponse(content=response.json(), status_code=response.status_code)
+
@app.get("/api/versions")
 async def versions(db: Session = Depends(get_db)):
    ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
    response = await proxy_request(f"{ollama_url}/api/versions", method="GET")
    return JSONResponse(content=response.json(), status_code=response.status_code)

+
+# --- Anthropic Messages API compatibility layer ---
+
+def _anthropic_content_to_str(content) -> str:
+    """Flatten Anthropic content (string or block array) to a plain string."""
+    if isinstance(content, str):
+        return content
+    if isinstance(content, list):
+        parts = []
+        for block in content:
+            if not isinstance(block, dict):
+                continue
+            if block.get("type") == "text":
+                parts.append(block.get("text", ""))
+            elif block.get("type") == "tool_result":
+                raw = block.get("content", "")
+                if isinstance(raw, list):
+                    raw = " ".join(r.get("text", "") for r in raw if isinstance(r, dict) and r.get("type") == "text")
+                parts.append(str(raw))
+        return " ".join(parts)
+    return str(content) if content else ""
+
+
+def _anthropic_messages_to_ollama(messages: list, system: str = None) -> list:
+    """Transform Anthropic messages array to Ollama /api/chat format."""
+    result = []
+    if system:
+        result.append({"role": "system", "content": system})
+    for msg in messages:
+        role = msg.get("role")
+        content = msg.get("content")
+        if role == "assistant" and isinstance(content, list):
+            text = " ".join(b.get("text", "") for b in content if isinstance(b, dict) and b.get("type") == "text")
+            tool_calls = [
+                {"function": {"name": b["name"], "arguments": b.get("input", {})}}
+                for b in content if isinstance(b, dict) and b.get("type") == "tool_use"
+            ]
+            entry = {"role": "assistant", "content": text}
+            if tool_calls:
+                entry["tool_calls"] = tool_calls
+            result.append(entry)
+        elif role == "user" and isinstance(content, list):
+            text_parts = []
+            for block in content:
+                if not isinstance(block, dict):
+                    continue
+                if block.get("type") == "tool_result":
+                    if text_parts:
+                        result.append({"role": "user", "content": " ".join(text_parts)})
+                        text_parts = []
+                    raw = block.get("content", "")
+                    if isinstance(raw, list):
+                        raw = " ".join(r.get("text", "") for r in raw if isinstance(r, dict) and r.get("type") == "text")
+                    result.append({"role": "tool", "content": str(raw)})
+                elif block.get("type") == "text":
+                    text_parts.append(block.get("text", ""))
+            if text_parts:
+                result.append({"role": "user", "content": " ".join(text_parts)})
+        else:
+            result.append({"role": role, "content": _anthropic_content_to_str(content)})
+    return result
+
+
+def _anthropic_tools_to_ollama(tools: list) -> list:
+    """Transform Anthropic tools to Ollama/OpenAI function format."""
+    return [
+        {
+            "type": "function",
+            "function": {
+                "name": t["name"],
+                "description": t.get("description", ""),
+                "parameters": t.get("input_schema", {}),
+            },
+        }
+        for t in tools
+    ]
+
+
+def _ollama_to_anthropic_response(ollama_resp: dict, model_name: str, msg_id: str) -> dict:
+    """Transform an Ollama /api/chat response to Anthropic Messages API format."""
+    msg = ollama_resp.get("message", {})
+    text = msg.get("content", "")
+    tool_calls = msg.get("tool_calls") or []
+
+    content_blocks = []
+    if text:
+        content_blocks.append({"type": "text", "text": text})
+
+    stop_reason = "end_turn"
+    for i, tc in enumerate(tool_calls):
+        stop_reason = "tool_use"
+        fn = tc.get("function", {})
+        args = fn.get("arguments", {})
+        if isinstance(args, str):
+            try:
+                args = json.loads(args)
+            except json.JSONDecodeError:
+                args = {}
+        content_blocks.append({
+            "type": "tool_use",
+            "id": f"toolu_{msg_id}_{i}",
+            "name": fn.get("name", ""),
+            "input": args,
+        })
+
+    return {
+        "id": f"msg_{msg_id}",
+        "type": "message",
+        "role": "assistant",
+        "content": content_blocks,
+        "model": model_name,
+        "stop_reason": stop_reason,
+        "stop_sequence": None,
+        "usage": {
+            "input_tokens": ollama_resp.get("prompt_eval_count", 0),
+            "output_tokens": ollama_resp.get("eval_count", 0),
+        },
+    }
+
+
+@app.post("/v1/messages")
+async def anthropic_messages(request: Request, db: Session = Depends(get_db)):
+    ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
+    body = await request.json()
+
+    force_model = crud.get_setting(db, "force_model") or None
+    model_name = force_model or os.getenv("ANTHROPIC_DEFAULT_MODEL") or body.get("model")
+    if not model_name:
+        raise HTTPException(status_code=422, detail="Field 'model' is required")
+
+    anthropic_msgs = body.get("messages", [])
+    system = body.get("system")
+
+    system_str = _anthropic_content_to_str(system) if system else ""
+    all_text = system_str + " ".join(_anthropic_content_to_str(m.get("content")) for m in anthropic_msgs)
+    prompt_tokens = crud.count_tokens(all_text)
+
+    if not crud.check_and_increment_quota(db, request.state.api_key_id, tokens=prompt_tokens, requests=1):
+        raise HTTPException(status_code=429, detail="Quota exceeded")
+
+    ollama_messages = _anthropic_messages_to_ollama(anthropic_msgs, system=system_str)
+    ollama_body: dict = {"model": model_name, "messages": ollama_messages, "stream": body.get("stream", False)}
+    if tools := body.get("tools"):
+        ollama_body["tools"] = _anthropic_tools_to_ollama(tools)
+
+    msg_id = secrets.token_hex(12)
+    target = f"{ollama_url}/api/chat"
+
+    usage_log.info('%s | /v1/messages | %s | ~%d tokens | "%s"',
+                   request.state.api_key_name, model_name, prompt_tokens, _last_user_msg(ollama_messages))
+    start = time.monotonic()
+
+    if body.get("stream"):
+        # Backend wird immer non-streaming aufgerufen; der Dev-Proxy baut SSE selbst auf.
+        # Das ist nötig, weil vorgelagerte Proxys (z.B. Produktiv-Proxy) /api/chat
+        # nur non-streaming exponieren.
+        non_stream_body = {**ollama_body, "stream": False}
+
+        async def generate():
+            try:
+                response = await proxy_request(target, method="POST", json_data=non_stream_body)
+                ollama_resp = response.json()
+            except Exception as exc:
+                error_log.error("Stream error | %s | /v1/messages | %s | %s: %s",
+                                request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
+                raise
+
+            msg = ollama_resp.get("message", {})
+            text = msg.get("content", "")
+            input_tokens = ollama_resp.get("prompt_eval_count", 0)
+            output_tokens = ollama_resp.get("eval_count", 0)
+
+            yield f"event: message_start\ndata: {json.dumps({'type': 'message_start', 'message': {'id': f'msg_{msg_id}', 'type': 'message', 'role': 'assistant', 'content': [], 'model': model_name, 'stop_reason': None, 'stop_sequence': None, 'usage': {'input_tokens': input_tokens, 'output_tokens': 0}}})}\n\n"
+            yield f"event: content_block_start\ndata: {json.dumps({'type': 'content_block_start', 'index': 0, 'content_block': {'type': 'text', 'text': ''}})}\n\n"
+            yield f"event: ping\ndata: {json.dumps({'type': 'ping'})}\n\n"
+            if text:
+                yield f"event: content_block_delta\ndata: {json.dumps({'type': 'content_block_delta', 'index': 0, 'delta': {'type': 'text_delta', 'text': text}})}\n\n"
+            yield f"event: content_block_stop\ndata: {json.dumps({'type': 'content_block_stop', 'index': 0})}\n\n"
+            yield f"event: message_delta\ndata: {json.dumps({'type': 'message_delta', 'delta': {'stop_reason': 'end_turn', 'stop_sequence': None}, 'usage': {'output_tokens': output_tokens}})}\n\n"
+            yield f"event: message_stop\ndata: {json.dumps({'type': 'message_stop'})}\n\n"
+            usage_log.info('%s | /v1/messages | %s | actual ↑%d ↓%d tokens | %.1fs',
+                           request.state.api_key_name, model_name,
+                           input_tokens, output_tokens,
+                           time.monotonic() - start)
+
+        return StreamingResponse(
+            generate(),
+            media_type="text/event-stream",
+            headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
+        )
+
+    try:
+        response = await proxy_request(target, method="POST", json_data=ollama_body)
+        result = _ollama_to_anthropic_response(response.json(), model_name, msg_id)
+        usage_log.info('%s | /v1/messages | %s | actual ↑%d ↓%d tokens | %.1fs',
+                       request.state.api_key_name, model_name,
+                       result["usage"]["input_tokens"], result["usage"]["output_tokens"],
+                       time.monotonic() - start)
+        return JSONResponse(content=result, status_code=response.status_code)
+    except Exception as exc:
+        error_log.error("Proxy error | %s | /v1/messages | %s | %s: %s",
+                        request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
+        raise
+
@app.get("/v1/models")
 async def list_openai_models(db: Session = Depends(get_db)):
    ollama_url = crud.get_setting(db, "ollama_url", os.getenv("OLLAMA_URL", "http://localhost:11434"))
@ -185,29 +417,51 @@ async def openai_chat_completions(request: Request, db: Session = Depends(get_db
    target = f"{ollama_url}/v1/chat/completions"

    if body.get("stream"):
+        existing_opts = body.get("stream_options") or {}
+        stream_body = {**body, "stream_options": {**existing_opts, "include_usage": True}}
+        start = time.monotonic()
+        usage_tokens = {"prompt": 0, "completion": 0}
+
        async def generate():
            try:
                async with httpx.AsyncClient(timeout=300.0) as client:
-                    async with client.stream("POST", target, json=body) as resp:
+                    async with client.stream("POST", target, json=stream_body, headers=_backend_headers()) as resp:
                        async for chunk in resp.aiter_bytes():
+                            try:
+                                for line in chunk.decode("utf-8", errors="ignore").splitlines():
+                                    if line.startswith("data: ") and "[DONE]" not in line:
+                                        data = json.loads(line[6:])
+                                        if u := data.get("usage"):
+                                            usage_tokens["prompt"] = u.get("prompt_tokens", 0)
+                                            usage_tokens["completion"] = u.get("completion_tokens", 0)
+                            except Exception:
+                                pass
                            yield chunk
            except Exception as exc:
                error_log.error("Stream error | %s | /v1/chat/completions | %s | %s: %s",
                                request.state.api_key_name, model_name, type(exc).__name__, exc, exc_info=exc)
                raise
+            finally:
+                usage_log.info('%s | /v1/chat/completions | %s | actual ↑%d ↓%d tokens | %.1fs',
+                               request.state.api_key_name, model_name,
+                               usage_tokens["prompt"], usage_tokens["completion"],
+                               time.monotonic() - start)
+
        return StreamingResponse(
            generate(),
            media_type="text/event-stream",
            headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
        )

+    start = time.monotonic()
    try:
        response = await proxy_request(target, method="POST", json_data=body)
        resp_json = response.json()
        usage = resp_json.get("usage", {})
-        usage_log.info('%s | /v1/chat/completions | %s | actual ↑%d ↓%d tokens',
+        usage_log.info('%s | /v1/chat/completions | %s | actual ↑%d ↓%d tokens | %.1fs',
                       request.state.api_key_name, model_name,
-                       usage.get("prompt_tokens", 0), usage.get("completion_tokens", 0))
+                       usage.get("prompt_tokens", 0), usage.get("completion_tokens", 0),
+                       time.monotonic() - start)
        return JSONResponse(content=resp_json, status_code=response.status_code)
    except Exception as exc:
        error_log.error("Proxy error | %s | /v1/chat/completions | %s | %s: %s",
--- a/backend/tests/test_admin_logs.py
+++ b/backend/tests/test_admin_logs.py
@ -0,0 +1,59 @@
+import os
+import pytest
+from fastapi.testclient import TestClient
+
+os.environ.setdefault("ADMIN_PASSWORD", "test-admin-pw")
+os.environ.setdefault("OLLAMA_URL", "http://127.0.0.1:9999")
+
+
+@pytest.fixture
+def client(tmp_path):
+    log_file = tmp_path / "usage.log"
+    log_file.write_text("\n".join(f"Zeile {i}" for i in range(1, 16)) + "\n")
+    (tmp_path / "error.log").write_text("Fehler A\nFehler B\n")
+    os.environ["LOG_FILE"] = str(log_file)
+
+    from database import Base, engine
+    Base.metadata.drop_all(bind=engine)
+    Base.metadata.create_all(bind=engine)
+
+    from admin import app
+    yield TestClient(app, raise_server_exceptions=False)
+
+    Base.metadata.drop_all(bind=engine)
+    os.environ.pop("LOG_FILE", None)
+
+
+AUTH = {"Authorization": "Bearer test-admin-pw"}
+
+
+def test_logs_usage_returns_last_10_lines(client):
+    resp = client.get("/api/logs/usage", headers=AUTH)
+    assert resp.status_code == 200
+    lines = resp.json()["lines"]
+    assert len(lines) == 10
+    assert lines[-1] == "Zeile 15"
+    assert lines[0] == "Zeile 6"
+
+
+def test_logs_error_returns_content(client):
+    resp = client.get("/api/logs/error", headers=AUTH)
+    assert resp.status_code == 200
+    assert resp.json()["lines"] == ["Fehler A", "Fehler B"]
+
+
+def test_logs_missing_file_returns_empty(client, tmp_path):
+    os.environ["LOG_FILE"] = str(tmp_path / "nonexistent.log")
+    resp = client.get("/api/logs/usage", headers=AUTH)
+    assert resp.status_code == 200
+    assert resp.json()["lines"] == []
+
+
+def test_logs_invalid_name_returns_400(client):
+    resp = client.get("/api/logs/secret", headers=AUTH)
+    assert resp.status_code == 400
+
+
+def test_logs_requires_auth(client):
+    resp = client.get("/api/logs/usage")
+    assert resp.status_code == 401
--- a/backend/tests/test_anthropic_messages.py
+++ b/backend/tests/test_anthropic_messages.py
@ -0,0 +1,272 @@
+import json
+import os
+from unittest.mock import AsyncMock, MagicMock, patch, call
+
+
+def _make_body(model="llama3", messages=None, stream=False, **kwargs):
+    body = {
+        "model": model,
+        "messages": messages or [{"role": "user", "content": "Hello"}],
+        "max_tokens": 100,
+    }
+    if stream:
+        body["stream"] = True
+    body.update(kwargs)
+    return body
+
+
+def _ollama_chat_response(content="Hi!", input_tokens=5, output_tokens=3):
+    return {
+        "model": "llama3",
+        "message": {"role": "assistant", "content": content},
+        "prompt_eval_count": input_tokens,
+        "eval_count": output_tokens,
+        "done": True,
+    }
+
+
+# --- Auth ---
+
+def test_messages_missing_auth_returns_401(test_client):
+    response = test_client.post("/v1/messages", json=_make_body())
+    assert response.status_code == 401
+
+
+def test_messages_invalid_key_returns_401(test_client):
+    response = test_client.post(
+        "/v1/messages",
+        headers={"x-api-key": "sk-invalid"},
+        json=_make_body(),
+    )
+    assert response.status_code == 401
+
+
+@patch("main.proxy_request", new_callable=AsyncMock)
+def test_messages_accepts_anthropic_auth_token_header(mock_proxy, test_client):
+    mock_proxy.return_value.status_code = 200
+    mock_proxy.return_value.json = lambda: _ollama_chat_response()
+    response = test_client.post(
+        "/v1/messages",
+        headers={"anthropic-auth-token": os.environ.get("TEST_API_KEY", "")},
+        json=_make_body(),
+    )
+    assert response.status_code == 200
+
+
+@patch("main.proxy_request", new_callable=AsyncMock)
+def test_messages_accepts_x_api_key_header(mock_proxy, test_client):
+    mock_proxy.return_value.status_code = 200
+    mock_proxy.return_value.json = lambda: _ollama_chat_response()
+    response = test_client.post(
+        "/v1/messages",
+        headers={"x-api-key": os.environ.get("TEST_API_KEY", "")},
+        json=_make_body(),
+    )
+    assert response.status_code == 200
+
+
+# --- Validation ---
+
+def test_messages_missing_model_returns_422(test_client):
+    env = {k: v for k, v in os.environ.items() if k != "ANTHROPIC_DEFAULT_MODEL"}
+    with patch.dict(os.environ, env, clear=True):
+        response = test_client.post(
+            "/v1/messages",
+            headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+            json={"messages": [{"role": "user", "content": "Hi"}], "max_tokens": 100},
+        )
+    assert response.status_code == 422
+
+
+@patch("main.proxy_request", new_callable=AsyncMock)
+def test_messages_anthropic_default_model_used_when_no_model_in_request(mock_proxy, test_client):
+    mock_proxy.return_value.status_code = 200
+    mock_proxy.return_value.json = lambda: _ollama_chat_response()
+    with patch.dict(os.environ, {"ANTHROPIC_DEFAULT_MODEL": "qwen3-coder:q8_0"}):
+        test_client.post(
+            "/v1/messages",
+            headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+            json={"messages": [{"role": "user", "content": "Hi"}], "max_tokens": 100},
+        )
+    sent_body = mock_proxy.call_args[1]["json_data"]
+    assert sent_body["model"] == "qwen3-coder:q8_0"
+
+
+# --- Quota ---
+
+def test_messages_quota_exceeded_returns_429(test_client):
+    with patch("main.crud.check_and_increment_quota", return_value=False):
+        response = test_client.post(
+            "/v1/messages",
+            headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+            json=_make_body(),
+        )
+        assert response.status_code == 429
+
+
+# --- Response format ---
+
+@patch("main.proxy_request", new_callable=AsyncMock)
+def test_messages_returns_anthropic_format(mock_proxy, test_client):
+    mock_proxy.return_value.status_code = 200
+    mock_proxy.return_value.json = lambda: _ollama_chat_response("Hello!")
+    response = test_client.post(
+        "/v1/messages",
+        headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+        json=_make_body(),
+    )
+    assert response.status_code == 200
+    data = response.json()
+    assert data["type"] == "message"
+    assert data["role"] == "assistant"
+    assert isinstance(data["content"], list)
+    assert data["content"][0]["type"] == "text"
+    assert data["content"][0]["text"] == "Hello!"
+    assert data["usage"]["input_tokens"] == 5
+    assert data["usage"]["output_tokens"] == 3
+
+
+# --- Request transformation ---
+
+@patch("main.proxy_request", new_callable=AsyncMock)
+def test_messages_system_prompt_becomes_first_system_message(mock_proxy, test_client):
+    mock_proxy.return_value.status_code = 200
+    mock_proxy.return_value.json = lambda: _ollama_chat_response()
+    test_client.post(
+        "/v1/messages",
+        headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+        json=_make_body(system="You are helpful"),
+    )
+    sent_body = mock_proxy.call_args[1]["json_data"]
+    assert sent_body["messages"][0]["role"] == "system"
+    assert sent_body["messages"][0]["content"] == "You are helpful"
+
+
+@patch("main.proxy_request", new_callable=AsyncMock)
+def test_messages_tools_transformed_to_ollama_function_format(mock_proxy, test_client):
+    mock_proxy.return_value.status_code = 200
+    mock_proxy.return_value.json = lambda: _ollama_chat_response()
+    test_client.post(
+        "/v1/messages",
+        headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+        json=_make_body(tools=[{
+            "name": "bash",
+            "description": "Run bash",
+            "input_schema": {"type": "object", "properties": {"command": {"type": "string"}}},
+        }]),
+    )
+    sent_body = mock_proxy.call_args[1]["json_data"]
+    assert sent_body["tools"][0]["type"] == "function"
+    assert sent_body["tools"][0]["function"]["name"] == "bash"
+    assert "parameters" in sent_body["tools"][0]["function"]
+
+
+@patch("main.proxy_request", new_callable=AsyncMock)
+def test_messages_tool_call_response_transformed_to_anthropic(mock_proxy, test_client):
+    mock_proxy.return_value.status_code = 200
+    mock_proxy.return_value.json = lambda: {
+        "model": "llama3",
+        "message": {
+            "role": "assistant",
+            "content": "",
+            "tool_calls": [{"function": {"name": "bash", "arguments": {"command": "ls"}}}],
+        },
+        "prompt_eval_count": 10,
+        "eval_count": 5,
+        "done": True,
+    }
+    response = test_client.post(
+        "/v1/messages",
+        headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+        json=_make_body(),
+    )
+    data = response.json()
+    assert data["stop_reason"] == "tool_use"
+    tool_block = next(b for b in data["content"] if b["type"] == "tool_use")
+    assert tool_block["name"] == "bash"
+    assert tool_block["input"] == {"command": "ls"}
+
+
+# --- Streaming ---
+
+@patch("main.proxy_request", new_callable=AsyncMock)
+def test_messages_streaming_returns_anthropic_sse_events(mock_proxy, test_client):
+    mock_proxy.return_value.status_code = 200
+    mock_proxy.return_value.json = lambda: {
+        "model": "llama3",
+        "message": {"role": "assistant", "content": "Hi!"},
+        "prompt_eval_count": 5,
+        "eval_count": 3,
+        "done": True,
+    }
+
+    response = test_client.post(
+        "/v1/messages",
+        headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+        json=_make_body(stream=True),
+    )
+
+    assert response.status_code == 200
+    events = [
+        json.loads(line[6:])
+        for line in response.text.splitlines()
+        if line.startswith("data: ")
+    ]
+    event_types = [e["type"] for e in events]
+    assert "message_start" in event_types
+    assert "content_block_start" in event_types
+    assert "content_block_delta" in event_types
+    assert "message_stop" in event_types
+
+    deltas = [e for e in events if e["type"] == "content_block_delta"]
+    text = "".join(d["delta"]["text"] for d in deltas)
+    assert text == "Hi!"
+
+
+# --- Backend-Auth (BACKEND_API_KEY) ---
+
+def test_proxy_request_forwards_backend_api_key(test_client):
+    with patch("main.httpx.AsyncClient") as mock_cls:
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {"result": "ok"}
+
+        mock_instance = AsyncMock()
+        mock_instance.__aenter__ = AsyncMock(return_value=mock_instance)
+        mock_instance.__aexit__ = AsyncMock(return_value=False)
+        mock_instance.request = AsyncMock(return_value=mock_response)
+        mock_cls.return_value = mock_instance
+
+        with patch.dict(os.environ, {"BACKEND_API_KEY": "sk-backend-secret"}):
+            test_client.post(
+                "/api/generate",
+                headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+                json={"model": "llama3", "prompt": "hi"},
+            )
+
+        _, kwargs = mock_instance.request.call_args
+        assert kwargs.get("headers", {}).get("Authorization") == "Bearer sk-backend-secret"
+
+
+def test_proxy_request_omits_auth_header_when_no_backend_key(test_client):
+    with patch("main.httpx.AsyncClient") as mock_cls:
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {"result": "ok"}
+
+        mock_instance = AsyncMock()
+        mock_instance.__aenter__ = AsyncMock(return_value=mock_instance)
+        mock_instance.__aexit__ = AsyncMock(return_value=False)
+        mock_instance.request = AsyncMock(return_value=mock_response)
+        mock_cls.return_value = mock_instance
+
+        env_without_key = {k: v for k, v in os.environ.items() if k != "BACKEND_API_KEY"}
+        with patch.dict(os.environ, env_without_key, clear=True):
+            test_client.post(
+                "/api/generate",
+                headers={"Authorization": f"Bearer {os.environ.get('TEST_API_KEY', '')}"},
+                json={"model": "llama3", "prompt": "hi"},
+            )
+
+        _, kwargs = mock_instance.request.call_args
+        assert "Authorization" not in kwargs.get("headers", {})
--- a/build_push.sh
+++ b/build_push.sh
@ -37,6 +37,7 @@ echo ""
 docker buildx build \
    --platform "$PLATFORM" \
    --push \
+    --build-arg APP_VERSION="$VERSION" \
    -t "$IMAGE:$VERSION" \
    -t "$IMAGE:latest" \
    .
--- a/frontend/src/main.jsx
+++ b/frontend/src/main.jsx
@ -76,14 +76,17 @@ const EMPTY_KEY_FORM = {
  name: '', expires_at: '', daily_tokens: '', monthly_tokens: '', daily_requests: '', monthly_requests: '',
 };

-function SettingsSection({ password }) {
+function SettingsSection({ password, refreshKey }) {
  const [settings, setSettings] = useState(null);
  const [availableModels, setAvailableModels] = useState([]);
  const [modelsLoading, setModelsLoading] = useState(false);
  const [ollamaReachable, setOllamaReachable] = useState(true);
  const [proxyEndpoint, setProxyEndpoint] = useState(null);
+  const [appVersion, setAppVersion] = useState(null);
  const [saved, setSaved] = useState(false);
  const [error, setError] = useState(null);
+  const [usageLog, setUsageLog] = useState([]);
+  const [errorLog, setErrorLog] = useState([]);

  const fetchModels = async (url, currentModel) => {
    setModelsLoading(true);
@ -108,16 +111,25 @@ function SettingsSection({ password }) {

  useEffect(() => {
    const headers = authHeaders(password);
-    Promise.all([
+    Promise.allSettled([
      axios.get('/api/settings', { headers }),
      axios.get('/api/proxy-info', { headers }),
-    ]).then(([settingsRes, proxyRes]) => {
-      const s = settingsRes.data;
+      axios.get('/api/logs/usage', { headers }),
+      axios.get('/api/logs/error', { headers }),
+    ]).then(([settingsRes, proxyRes, usageRes, errorRes]) => {
+      if (settingsRes.status === 'rejected' || proxyRes.status === 'rejected') {
+        setError('Einstellungen konnten nicht geladen werden.');
+        return;
+      }
+      const s = settingsRes.value.data;
      setSettings(s);
-      setProxyEndpoint(proxyRes.data.endpoint);
+      setProxyEndpoint(proxyRes.value.data.endpoint);
+      setAppVersion(proxyRes.value.data.version);
+      if (usageRes.status === 'fulfilled') setUsageLog(usageRes.value.data.lines);
+      if (errorRes.status === 'fulfilled') setErrorLog(errorRes.value.data.lines);
      fetchModels(s.ollama_url, s.force_model);
-    }).catch(() => setError('Einstellungen konnten nicht geladen werden.'));
-  }, []);
+    });
+  }, [refreshKey]);

  const handleSave = async (e) => {
    e.preventDefault();
@ -145,6 +157,10 @@ function SettingsSection({ password }) {
            <small> (Änderung erfordert Neustart)</small>
          </span>
        </div>
+        <div className="settings-row">
+          <label>Version</label>
+          <span className="settings-value">{appVersion ?? '…'}</span>
+        </div>
        <div className="settings-row">
          <label>Ollama-Endpunkt</label>
          <div className="settings-input-wrap">
@ -184,8 +200,18 @@ function SettingsSection({ password }) {
        </div>
        {error && <div className="error">{error}</div>}
        {saved && <div className="success">Gespeichert.</div>}
-        <button type="submit">Speichern</button>
+        <button type="submit">Änderungen übernehmen</button>
      </form>
+      <div className="log-section">
+        <h3>Nutzungslog (letzte 10 Einträge)</h3>
+        <pre className="log-pre">{usageLog.length > 0 ? usageLog.join('\n') : '— keine Einträge —'}</pre>
+        {errorLog.length > 0 && (
+          <>
+            <h3>Fehlerlog (letzte 10 Einträge)</h3>
+            <pre className="log-pre log-pre-error">{errorLog.join('\n')}</pre>
+          </>
+        )}
+      </div>
    </section>
  );
 }
@ -200,21 +226,31 @@ function App() {
  const [creating, setCreating] = useState(false);
  const [editKey, setEditKey] = useState(null);
  const [editForm, setEditForm] = useState({});
-
-  useEffect(() => {
-    if (!password) { setLoading(false); return; }
-    fetchApiKeys().finally(() => setLoading(false));
-  }, [password]);
+  const [refreshKey, setRefreshKey] = useState(0);
+  const [lastUpdated, setLastUpdated] = useState(null);

  const fetchApiKeys = async () => {
    try {
      const res = await axios.get('/api/api-keys', { headers: authHeaders(password) });
      setApiKeys(res.data);
+      setLastUpdated(new Date());
    } catch {
      setError('API-Keys konnten nicht geladen werden.');
    }
  };

+  useEffect(() => {
+    if (!password) { setLoading(false); return; }
+    fetchApiKeys().finally(() => setLoading(false));
+
+    const timer = setInterval(() => {
+      fetchApiKeys();
+      setRefreshKey(k => k + 1);
+    }, 5 * 60 * 1000);
+
+    return () => clearInterval(timer);
+  }, [password]);
+
  const handleCreate = async (e) => {
    e.preventDefault();
    setCreating(true);
@ -295,6 +331,7 @@ function App() {

  const logout = () => {
    sessionStorage.removeItem('admin_password');
+    setLastUpdated(null);
    setPassword(null);
  };

@ -306,10 +343,17 @@ function App() {
    <div className="container">
      <div className="header">
        <h1>Ollama Proxy Admin</h1>
-        <button onClick={logout}>Abmelden</button>
+        <div className="header-right">
+          {lastUpdated && (
+            <span className="last-updated">
+              Aktualisiert: {lastUpdated.toLocaleTimeString('de-DE', { hour: '2-digit', minute: '2-digit' })}
+            </span>
+          )}
+          <button onClick={logout}>Abmelden</button>
+        </div>
      </div>

-      <SettingsSection password={password} />
+      <SettingsSection password={password} refreshKey={refreshKey} />

      <section>
        <h2>Neuer API-Key</h2>
--- a/frontend/src/styles.css
+++ b/frontend/src/styles.css
@ -182,6 +182,7 @@ tr:hover {
 .settings-row label {
  width: 160px;
  flex-shrink: 0;
+  font-size: 14px;
  font-weight: 500;
  color: #2c3e50;
 }
@ -408,7 +409,7 @@ tr:hover {
 .edit-form label small {
  font-weight: 400;
  color: #999;
-  font-size: 11px;
+  font-size: 12px;
 }

 .edit-form input {
@ -452,3 +453,46 @@ tr:hover {
 .btn-cancel:hover {
  background: #7f8c8d;
 }
+
+.log-section {
+  margin-top: 24px;
+  border-top: 1px solid #eee;
+  padding-top: 16px;
+}
+
+.log-section h3 {
+  font-size: 14px;
+  font-weight: 600;
+  color: #34495e;
+  margin: 0 0 6px;
+}
+
+.log-pre {
+  background: #1e2a35;
+  color: #c8d6df;
+  font-family: 'Menlo', 'Consolas', monospace;
+  font-size: 11px;
+  line-height: 1.6;
+  padding: 10px 14px;
+  border-radius: 4px;
+  margin: 0 0 14px;
+  overflow-x: auto;
+  white-space: pre;
+}
+
+.log-pre-error {
+  background: #2d1b1b;
+  color: #f5a0a0;
+  margin-bottom: 0;
+}
+
+.header-right {
+  display: flex;
+  align-items: center;
+  gap: 16px;
+}
+
+.last-updated {
+  font-size: 12px;
+  color: #95a5a6;
+}
--- a/frontend/vite.config.js
+++ b/frontend/vite.config.js
@ -11,6 +11,7 @@ export default defineConfig({
      '/api/settings': 'http://localhost:8001',
      '/api/ollama-models': 'http://localhost:8001',
      '/api/proxy-info': 'http://localhost:8001',
+      '/api/logs': 'http://localhost:8001',
      '/api': 'http://localhost:8000',
    },
  },
--- a/run_tests.py
+++ b/run_tests.py
@ -1,8 +0,0 @@
-#!/usr/bin/env python3
-"""Pytest runner for Ollama Proxy tests."""
-import subprocess
-import sys
-
-if __name__ == "__main__":
-    result = subprocess.run([sys.executable, "-m", "pytest"] + sys.argv[1:], cwd="backend")
-    sys.exit(result.returncode)
--- a/start.sh
+++ b/start.sh
@ -1,17 +1,19 @@
 #!/bin/bash

+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+
 # .env laden
-if [ -f .env ]; then
+if [ -f "$SCRIPT_DIR/.env" ]; then
    set -a
-    source .env
+    source "$SCRIPT_DIR/.env"
    set +a
 fi

 # Virtuelle Umgebung aktivieren falls vorhanden
-if [ -f .venv/bin/activate ]; then
-    source .venv/bin/activate
-elif [ -f venv/bin/activate ]; then
-    source venv/bin/activate
+if [ -f "$SCRIPT_DIR/.venv/bin/activate" ]; then
+    source "$SCRIPT_DIR/.venv/bin/activate"
+elif [ -f "$SCRIPT_DIR/venv/bin/activate" ]; then
+    source "$SCRIPT_DIR/venv/bin/activate"
 fi

 if [ -z "$ADMIN_PASSWORD" ]; then
--- a/start_claude.sh
+++ b/start_claude.sh
@ -0,0 +1,33 @@
+#!/bin/bash
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+
+# .env laden
+if [ -f "$SCRIPT_DIR/.env" ]; then
+    set -a
+    source "$SCRIPT_DIR/.env"
+    set +a
+fi
+
+# API-Key: erstes Argument hat Vorrang, sonst Umgebungsvariable PROXY_API_KEY
+API_KEY="${1:-$PROXY_API_KEY}"
+
+if [ -z "$API_KEY" ]; then
+    echo "Fehler: Kein API-Key angegeben."
+    echo "Verwendung: ./start_claude.sh sk-dein-key"
+    echo "       oder: PROXY_API_KEY=sk-dein-key ./start_claude.sh"
+    exit 1
+fi
+
+# 0.0.0.0 ist eine Bind-Adresse, kein gültiger Client-Host
+PROXY_HOST="${PROXY_HOST:-0.0.0.0}"
+PROXY_PORT="${PROXY_PORT:-8000}"
+if [ "$PROXY_HOST" = "0.0.0.0" ]; then
+    PROXY_HOST="localhost"
+fi
+
+export ANTHROPIC_BASE_URL="http://${PROXY_HOST}:${PROXY_PORT}"
+export ANTHROPIC_AUTH_TOKEN="$API_KEY"
+
+echo "Verbinde mit Proxy: $ANTHROPIC_BASE_URL"
+exec claude
--- a/test_api.sh
+++ b/test_api.sh
@ -1,7 +0,0 @@
-curl -X POST http://localhost:8000/api/generate \
-  -H "Authorization: sk-admin-key" \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "llama3",
-    "prompt": "Test"
-  }'
Author	SHA1	Message	Date
Oliver Hofmann	92ed7368eb	Remove redundant run_tests.py wrapper	2026-05-10 11:51:13 +02:00
Oliver Hofmann	5a50d0be04	Remove obsolete test_api.sh	2026-05-10 11:47:15 +02:00
Oliver Hofmann	21cab46365	Remove docs/ from tracking, add to gitignore	2026-05-10 11:44:10 +02:00
Oliver Hofmann	fcaea9e3a9	Add license section to DockerHub descriptions	2026-05-10 11:21:39 +02:00
Oliver Hofmann	eb83c52b7f	Rename settings save button to 'Änderungen übernehmen'	2026-05-10 10:50:09 +02:00
Oliver Hofmann	f551b2a421	Harmonize typography: remove log uppercase, normalize label font sizes	2026-05-10 10:45:41 +02:00
Oliver Hofmann	79a30dd179	Route /api/logs to admin API in Vite proxy config	2026-05-10 10:40:41 +02:00
Oliver Hofmann	0353e0299f	Use Promise.allSettled so log fetch failures don't break settings display	2026-05-10 10:34:28 +02:00
Oliver Hofmann	5a94fc6d90	Reset lastUpdated on logout	2026-05-10 10:22:33 +02:00
Oliver Hofmann	cf1b3f7786	Add 5-minute auto-reload and last-updated timestamp to admin UI	2026-05-10 10:20:03 +02:00
Oliver Hofmann	02b4ad06ca	Fix pre whitespace, log-pre-error margin, error log heading	2026-05-10 10:18:27 +02:00
Oliver Hofmann	ca55783b90	Show last 10 log lines in settings section	2026-05-10 10:15:48 +02:00
Oliver Hofmann	a9b0168c71	Add GET /api/logs/{name} endpoint to admin API	2026-05-10 10:11:55 +02:00
Oliver Hofmann	7ce4d3a895	Add implementation plan: log viewer and auto-reload	2026-05-10 10:09:34 +02:00
Oliver Hofmann	fff9d1048d	Add design spec: log viewer and auto-reload for admin UI	2026-05-10 10:07:44 +02:00
Oliver Hofmann	cdd55880d6	Remove unnecessary bold formatting from Anthropic API feature entries	2026-05-10 10:00:34 +02:00
Oliver Hofmann	6b2ae4b072	Remove BACKEND_API_KEY from public documentation	2026-05-10 09:58:50 +02:00
Oliver Hofmann	4c8a8d4afb	Update Kurzanleitung: add gemma4:e4b, Claude Code section	2026-05-10 09:56:35 +02:00
Oliver Hofmann	9872175fb0	Add LICENSE, update docs with Anthropic endpoint and free-claude-code attribution	2026-05-10 09:53:12 +02:00
Oliver Hofmann	cc3ee5a03c	Add Anthropic Messages API compatibility layer (/v1/messages) - POST /v1/messages endpoint with full quota enforcement and auth - Accepts x-api-key and anthropic-auth-token headers (for Claude Code) - Transforms Anthropic request/response format ↔ Ollama /api/chat - Streaming support via Anthropic SSE format - Tool use support (request and response transformation) - ANTHROPIC_DEFAULT_MODEL env var for model selection without admin UI - BACKEND_API_KEY env var for forwarding auth to upstream proxies - Fix SQLite path always resolved relative to database.py location - start.sh and start_claude.sh load .env relative to script location	2026-05-10 09:45:38 +02:00
Oliver Hofmann	70fd61608b	Log actual tokens and elapsed time for all endpoints incl. streaming For streaming /v1/chat/completions: inject stream_options.include_usage, parse usage from SSE chunks, log actual ↑↓ tokens and wall time in the generator's finally block. Add elapsed time to all second log entries.	2026-05-08 09:47:32 +02:00
Oliver Hofmann	07f6fec4bf	Show app version in admin UI and /version endpoint Embed APP_VERSION build arg in Docker image (default: dev). build_push.sh passes the git tag as build arg. Proxy exposes GET /version, admin UI shows it as read-only field in settings.	2026-05-08 09:30:23 +02:00
Oliver Hofmann	6761a73364	Add /api/ps example to Kurzanleitung	2026-05-08 09:25:44 +02:00
Oliver Hofmann	0d1ce96c99	Expose /api/ps to show currently loaded model	2026-05-08 09:22:17 +02:00
Oliver Hofmann	b16b3af44d	Mention VPN as alias for Intranet in Kurzanleitung	2026-05-08 09:17:08 +02:00