first draft for read me + requirements.txt

2026-03-05 13:18:17 +01:00 · 2026-03-05 13:18:17 +01:00 · a4b7190756
commit a4b7190756
parent f95d59e44d
3 changed files with 292 additions and 53 deletions
--- a/.gitignore
+++ b/.gitignore
@ -7,4 +7,5 @@
 !.gitignore
 !*.service
 !*.timer
-!*.yaml
+!*.yaml
+!*.txt
--- a/readme.md
+++ b/readme.md
@ -1,67 +1,279 @@
 # Multimodal Driver State Analysis

-Ein umfassendes Framework zur Analyse von Fahrerverhalten durch kombinierte Feature-Extraktion aus Facial Action Units (AU) und Eye-Tracking Daten.
+This repository contains a full workflow for multimodal driver-state analysis in a simulator setting, from raw recording data to trained models and real-time inference.

-## 📋 Projektübersicht
+It combines two modalities:
+- Facial Action Units (AUs)
+- Eye-tracking features (fixations, saccades, blinks, pupil dynamics)

-Dieses Projekt verarbeitet multimodale Sensordaten aus Fahrsimulator-Studien und extrahiert zeitbasierte Features für die Analyse von Fahrerzuständen. Die Pipeline kombiniert:
+## What This Project Covers

- **Facial Action Units (AU)**: 20 Gesichtsaktionseinheiten zur Emotionserkennung
- **Eye-Tracking**: Fixationen, Sakkaden, Blinks und Pupillenmetriken
+- Data extraction from raw simulator files (`.h5` / ownCloud)
+- Conversion to subject-level Parquet files
+- Sliding-window feature engineering (AU + eye tracking)
+- Exploratory data analysis (EDA) notebooks
+- Model training experiments (CNN, XGBoost, Isolation Forest, OCSVM, DeepSVDD)
+- Real-time prediction from SQLite + MQTT publishing
+- Optional Linux `systemd` deployment (`predict.service` + `predict.timer`)

-## 🎯 Features
+## Repository Structure

-### Datenverarbeitung
- **Sliding Window Aggregation**: 50-Sekunden-Fenster mit 5-Sekunden-Schrittweite
- **Hierarchische Gruppierung**: Automatische Segmentierung nach STUDY/LEVEL/PHASE
- **Robuste Fehlerbehandlung**: Graceful Degradation bei fehlenden Modalitäten
-
-### Extrahierte Features
-
-#### Facial Action Units (20 AUs)
-Für jede AU wird der Mittelwert pro Window berechnet:
- AU01 (Inner Brow Raiser) bis AU43 (Eyes Closed)
- Aggregation: `mean` über 50s Window
-
-#### Eye-Tracking Features
-**Fixationen:**
- Anzahl nach Dauer-Kategorien (66-150ms, 300-500ms, >1000ms, >100ms)
- Mittelwert und Median der Fixationsdauer
-
-**Sakkaden:**
- Anzahl, mittlere Amplitude, mittlere/mediane Dauer
-
-**Blinks:**
- Anzahl, mittlere/mediane Dauer
-
-**Pupille:**
- Mittlere Pupillengröße
- Index of Pupillary Activity (IPA) - Hochfrequenzkomponente (0.6-2.0 Hz)
-
-## 🏗️ Projektstruktur
-
-to be continued.
-
-## 🚀 Installation
-
-### Voraussetzungen
-```bash
-Python 3.12
+```text
+Fahrsimulator_MSY2526_AI/
+|-- dataset_creation/
+|   |-- parquet_file_creation.py
+|   |-- create_parquet_files_from_owncloud.py
+|   |-- combined_feature_creation.py
+|   |-- maxDist.py
+|   |-- AU_creation/
+|   |   |-- AU_creation_service.py
+|   |   `-- pyfeat_docu.ipynb
+|   `-- camera_handling/
+|       |-- camera_stream_AU_and_ET_new.py
+|       |-- eyeFeature_new.py
+|       |-- db_helper.py
+|       `-- *.py (legacy variants/tests)
+|-- EDA/
+|   `-- *.ipynb
+|-- model_training/
+|   |-- CNN/
+|   |-- xgboost/
+|   |-- IsolationForest/
+|   |-- OCSVM/
+|   |-- DeepSVDD/
+|   |-- MAD_outlier_removal/
+|   `-- tools/
+|-- predict_pipeline/
+|   |-- predict_sample.py
+|   |-- config.yaml
+|   |-- predict.service
+|   |-- predict.timer
+|   |-- predict_service_timer_documentation.md
+|   `-- fill_db.ipynb
+|-- tools/
+|   `-- db_helpers.py
+`-- readme.md
 ```

-### Dependencies
+## End-to-End Workflow
+
+## 1) Data Ingestion and Conversion
+
+Main scripts:
+- `dataset_creation/create_parquet_files_from_owncloud.py`
+- `dataset_creation/parquet_file_creation.py`
+
+Purpose:
+- Load simulator recordings from ownCloud or local `.h5` files.
+- Select relevant columns (`STUDY`, `LEVEL`, `PHASE`, `FACE_AU*`, `EYE_*`).
+- Filter invalid rows (for example `LEVEL == 0`).
+- Save cleaned subject-level Parquet files.
+
+Notes:
+- These scripts contain placeholders for paths and credentials that must be adapted.
+- ownCloud download uses `pyocclient` (`owncloud` module).
+
+## 2) Feature Engineering (Offline Dataset)
+
+Main script:
+- `dataset_creation/combined_feature_creation.py`
+
+Behavior:
+- Processes all Parquet files in an input directory.
+- Applies sliding windows:
+  - Window size: 50 seconds (`25 Hz * 50 = 1250 samples`)
+  - Step size: 5 seconds (`125 samples`)
+- Groups data by available context columns (`STUDY`, `LEVEL`, `PHASE`).
+- Computes:
+  - AU means per window (`FACE_AUxx_mean`)
+  - Eye-tracking features:
+    - Fixation counts and duration stats
+    - Saccade count/amplitude/duration stats
+    - Blink count/duration stats
+    - Pupil mean and IPA (high-frequency pupil activity)
+
+Output:
+- A combined Parquet dataset (one row per window), ready for model training.
+
+## 3) Camera-Based Online Feature Extraction
+
+Main scripts:
+- `dataset_creation/camera_handling/camera_stream_AU_and_ET_new.py`
+- `dataset_creation/camera_handling/eyeFeature_new.py`
+
+Behavior:
+- Captures webcam stream (`OpenCV`) at ~25 FPS.
+- Computes eye metrics with `MediaPipe`.
+- Records 50-second overlapping segments (new start every 5 seconds).
+- Extracts AUs from recorded clips using `py-feat`.
+- Extracts eye features from saved gaze parquet.
+- Writes combined feature rows into an SQLite table (`feature_table`).
+
+Important:
+- Script paths and DB locations are currently hardcoded for the target environment and must be adapted.
+
+## 4) Model Training
+
+Location:
+- `model_training/` (mostly notebook-driven)
+
+Includes experiments for:
+- CNN-based fusion variants
+- XGBoost
+- Isolation Forest
+- OCSVM
+- DeepSVDD
+
+Utility modules:
+- `model_training/tools/scaler.py` for fitting/saving/applying scalers
+- `model_training/tools/mad_outlier_removal.py`
+- `model_training/tools/performance_split.py`
+- `model_training/tools/evaluation_tools.py`
+
+## 5) Real-Time Prediction and Messaging
+
+Main script:
+- `predict_pipeline/predict_sample.py`
+
+Runtime behavior:
+- Reads latest row from SQLite (`database.path`, `database.table`, `database.key`).
+- Applies NaN handling using fallback medians from `config.yaml`.
+- Optionally scales features using a saved scaler (`.pkl` or `.joblib`).
+- Loads model (`.keras`, `.pkl`, or `.joblib`) and predicts.
+- Publishes JSON message via MQTT (topic/host/qos from config).
+
+Message shape:
+```json
+{
+  "valid": true,
+  "_id": 123,
+  "prediction": 0
+}
+```
+(`prediction` key is configurable via `mqtt.publish_format.result_key`.)
+
+## 6) Automated Execution with systemd (Linux)
+
+Files:
+- `predict_pipeline/predict.service`
+- `predict_pipeline/predict.timer`
+
+Current timer behavior:
+- first run after 60s (`OnActiveSec=60`)
+- then every 5s (`OnUnitActiveSec=5`)
+
+Detailed operation and commands:
+- `predict_pipeline/predict_service_timer_documentation.md`
+
+## Installation
+
+Install dependencies from the tracked requirements file:
+
 ```bash
 pip install -r requirements.txt
 ```

-**Wichtigste Pakete:**
- `pandas`, `numpy` - Datenverarbeitung
- `scipy` - Signalverarbeitung
- `scikit-learn` - Feature-Skalierung & ML
- `pygazeanalyser` - Eye-Tracking Analyse
- `pyarrow` - Parquet I/O
+## Python Version

-## 💻 Usage
+Recommended:
+- Python `3.10` to `3.12`

-### 1. Feature-Extraktion
- to be continued
+## Core Dependencies
+
+```bash
+pip install numpy pandas scipy scikit-learn pyarrow pyyaml joblib paho-mqtt matplotlib
+```
+
+## Computer Vision / Eye Tracking / AU Stack
+
+```bash
+pip install opencv-python mediapipe torch moviepy
+pip install pygazeanalyser
+pip install py-feat
+```
+
+## Data Access (optional)
+
+```bash
+pip install pyocclient h5py tables
+```
+
+## Notes
+- `tensorflow` is required for `.keras` model inference in `predict_sample.py`.
+- `py-feat`, `mediapipe`, and `torch` can be platform-sensitive; pin versions per your target machine.
+
+## Configuration
+
+Primary runtime config:
+- `predict_pipeline/config.yaml`
+
+Sections:
+- `database`: SQLite path/table/key
+- `model`: model file path
+- `scaler`: scaling toggle + scaler path
+- `mqtt`: broker connection + publish format
+- `sample.columns`: expected feature order
+- `fallback`: median/default feature values used for NaN replacement
+
+Before running prediction, verify all absolute paths in `config.yaml`.
+
+## Quick Start
+
+## A) Build Training Dataset (Offline)
+
+1. Set input/output paths in:
+- `dataset_creation/parquet_file_creation.py`
+- `dataset_creation/combined_feature_creation.py`
+
+2. Generate subject Parquet files:
+```bash
+python dataset_creation/parquet_file_creation.py
+```
+
+3. Generate combined sliding-window feature dataset:
+```bash
+python dataset_creation/combined_feature_creation.py
+```
+
+## B) Run Prediction Once
+
+1. Update paths in `predict_pipeline/config.yaml`.
+2. Run:
+```bash
+python predict_pipeline/predict_sample.py
+```
+
+## C) Run as systemd Service + Timer (Linux)
+
+1. Copy unit files to `/etc/systemd/system/`.
+2. Adjust `ExecStart` and user in `predict.service`.
+3. Enable and start timer:
+```bash
+sudo systemctl daemon-reload
+sudo systemctl enable predict.timer
+sudo systemctl start predict.timer
+```
+
+Monitor logs:
+```bash
+journalctl -u predict.service -f
+```
+
+## Database and Table Expectations
+
+The prediction script expects a SQLite table with at least:
+- `_Id`
+- `start_time`
+- all model feature columns listed in `config.yaml` under `sample.columns`
+
+The camera pipeline writes feature rows into `feature_table` using helper utilities in:
+- `dataset_creation/camera_handling/db_helper.py`
+- `tools/db_helpers.py`
+
+
+
+
+## License
+
+No license file is currently present in this repository.
+Add a `LICENSE` file if this project should be shared or reused externally.
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,26 @@
+# Core data + ML utilities
+numpy
+pandas
+scipy
+scikit-learn
+pyarrow
+joblib
+PyYAML
+matplotlib
+
+# Prediction pipeline
+paho-mqtt
+tensorflow
+
+# Camera / feature extraction stack
+opencv-python
+mediapipe
+torch
+moviepy
+pygazeanalyser
+py-feat
+
+# Data ingestion (ownCloud + HDF)
+pyocclient
+h5py
+tables