diff --git a/.gitignore b/.gitignore index af0c641..a4b28eb 100644 --- a/.gitignore +++ b/.gitignore @@ -7,4 +7,5 @@ !.gitignore !*.service !*.timer -!*.yaml \ No newline at end of file +!*.yaml +!*.txt \ No newline at end of file diff --git a/readme.md b/readme.md index 738cebf..063e485 100644 --- a/readme.md +++ b/readme.md @@ -1,67 +1,279 @@ # Multimodal Driver State Analysis -Ein umfassendes Framework zur Analyse von Fahrerverhalten durch kombinierte Feature-Extraktion aus Facial Action Units (AU) und Eye-Tracking Daten. +This repository contains a full workflow for multimodal driver-state analysis in a simulator setting, from raw recording data to trained models and real-time inference. -## 📋 Projektübersicht +It combines two modalities: +- Facial Action Units (AUs) +- Eye-tracking features (fixations, saccades, blinks, pupil dynamics) -Dieses Projekt verarbeitet multimodale Sensordaten aus Fahrsimulator-Studien und extrahiert zeitbasierte Features für die Analyse von Fahrerzuständen. Die Pipeline kombiniert: +## What This Project Covers -- **Facial Action Units (AU)**: 20 Gesichtsaktionseinheiten zur Emotionserkennung -- **Eye-Tracking**: Fixationen, Sakkaden, Blinks und Pupillenmetriken +- Data extraction from raw simulator files (`.h5` / ownCloud) +- Conversion to subject-level Parquet files +- Sliding-window feature engineering (AU + eye tracking) +- Exploratory data analysis (EDA) notebooks +- Model training experiments (CNN, XGBoost, Isolation Forest, OCSVM, DeepSVDD) +- Real-time prediction from SQLite + MQTT publishing +- Optional Linux `systemd` deployment (`predict.service` + `predict.timer`) -## 🎯 Features +## Repository Structure -### Datenverarbeitung -- **Sliding Window Aggregation**: 50-Sekunden-Fenster mit 5-Sekunden-Schrittweite -- **Hierarchische Gruppierung**: Automatische Segmentierung nach STUDY/LEVEL/PHASE -- **Robuste Fehlerbehandlung**: Graceful Degradation bei fehlenden Modalitäten - -### Extrahierte Features - -#### Facial Action Units (20 AUs) -Für jede AU wird der Mittelwert pro Window berechnet: -- AU01 (Inner Brow Raiser) bis AU43 (Eyes Closed) -- Aggregation: `mean` über 50s Window - -#### Eye-Tracking Features -**Fixationen:** -- Anzahl nach Dauer-Kategorien (66-150ms, 300-500ms, >1000ms, >100ms) -- Mittelwert und Median der Fixationsdauer - -**Sakkaden:** -- Anzahl, mittlere Amplitude, mittlere/mediane Dauer - -**Blinks:** -- Anzahl, mittlere/mediane Dauer - -**Pupille:** -- Mittlere Pupillengröße -- Index of Pupillary Activity (IPA) - Hochfrequenzkomponente (0.6-2.0 Hz) - -## 🏗️ Projektstruktur - -to be continued. - -## 🚀 Installation - -### Voraussetzungen -```bash -Python 3.12 +```text +Fahrsimulator_MSY2526_AI/ +|-- dataset_creation/ +| |-- parquet_file_creation.py +| |-- create_parquet_files_from_owncloud.py +| |-- combined_feature_creation.py +| |-- maxDist.py +| |-- AU_creation/ +| | |-- AU_creation_service.py +| | `-- pyfeat_docu.ipynb +| `-- camera_handling/ +| |-- camera_stream_AU_and_ET_new.py +| |-- eyeFeature_new.py +| |-- db_helper.py +| `-- *.py (legacy variants/tests) +|-- EDA/ +| `-- *.ipynb +|-- model_training/ +| |-- CNN/ +| |-- xgboost/ +| |-- IsolationForest/ +| |-- OCSVM/ +| |-- DeepSVDD/ +| |-- MAD_outlier_removal/ +| `-- tools/ +|-- predict_pipeline/ +| |-- predict_sample.py +| |-- config.yaml +| |-- predict.service +| |-- predict.timer +| |-- predict_service_timer_documentation.md +| `-- fill_db.ipynb +|-- tools/ +| `-- db_helpers.py +`-- readme.md ``` -### Dependencies +## End-to-End Workflow + +## 1) Data Ingestion and Conversion + +Main scripts: +- `dataset_creation/create_parquet_files_from_owncloud.py` +- `dataset_creation/parquet_file_creation.py` + +Purpose: +- Load simulator recordings from ownCloud or local `.h5` files. +- Select relevant columns (`STUDY`, `LEVEL`, `PHASE`, `FACE_AU*`, `EYE_*`). +- Filter invalid rows (for example `LEVEL == 0`). +- Save cleaned subject-level Parquet files. + +Notes: +- These scripts contain placeholders for paths and credentials that must be adapted. +- ownCloud download uses `pyocclient` (`owncloud` module). + +## 2) Feature Engineering (Offline Dataset) + +Main script: +- `dataset_creation/combined_feature_creation.py` + +Behavior: +- Processes all Parquet files in an input directory. +- Applies sliding windows: + - Window size: 50 seconds (`25 Hz * 50 = 1250 samples`) + - Step size: 5 seconds (`125 samples`) +- Groups data by available context columns (`STUDY`, `LEVEL`, `PHASE`). +- Computes: + - AU means per window (`FACE_AUxx_mean`) + - Eye-tracking features: + - Fixation counts and duration stats + - Saccade count/amplitude/duration stats + - Blink count/duration stats + - Pupil mean and IPA (high-frequency pupil activity) + +Output: +- A combined Parquet dataset (one row per window), ready for model training. + +## 3) Camera-Based Online Feature Extraction + +Main scripts: +- `dataset_creation/camera_handling/camera_stream_AU_and_ET_new.py` +- `dataset_creation/camera_handling/eyeFeature_new.py` + +Behavior: +- Captures webcam stream (`OpenCV`) at ~25 FPS. +- Computes eye metrics with `MediaPipe`. +- Records 50-second overlapping segments (new start every 5 seconds). +- Extracts AUs from recorded clips using `py-feat`. +- Extracts eye features from saved gaze parquet. +- Writes combined feature rows into an SQLite table (`feature_table`). + +Important: +- Script paths and DB locations are currently hardcoded for the target environment and must be adapted. + +## 4) Model Training + +Location: +- `model_training/` (mostly notebook-driven) + +Includes experiments for: +- CNN-based fusion variants +- XGBoost +- Isolation Forest +- OCSVM +- DeepSVDD + +Utility modules: +- `model_training/tools/scaler.py` for fitting/saving/applying scalers +- `model_training/tools/mad_outlier_removal.py` +- `model_training/tools/performance_split.py` +- `model_training/tools/evaluation_tools.py` + +## 5) Real-Time Prediction and Messaging + +Main script: +- `predict_pipeline/predict_sample.py` + +Runtime behavior: +- Reads latest row from SQLite (`database.path`, `database.table`, `database.key`). +- Applies NaN handling using fallback medians from `config.yaml`. +- Optionally scales features using a saved scaler (`.pkl` or `.joblib`). +- Loads model (`.keras`, `.pkl`, or `.joblib`) and predicts. +- Publishes JSON message via MQTT (topic/host/qos from config). + +Message shape: +```json +{ + "valid": true, + "_id": 123, + "prediction": 0 +} +``` +(`prediction` key is configurable via `mqtt.publish_format.result_key`.) + +## 6) Automated Execution with systemd (Linux) + +Files: +- `predict_pipeline/predict.service` +- `predict_pipeline/predict.timer` + +Current timer behavior: +- first run after 60s (`OnActiveSec=60`) +- then every 5s (`OnUnitActiveSec=5`) + +Detailed operation and commands: +- `predict_pipeline/predict_service_timer_documentation.md` + +## Installation + +Install dependencies from the tracked requirements file: + ```bash pip install -r requirements.txt ``` -**Wichtigste Pakete:** -- `pandas`, `numpy` - Datenverarbeitung -- `scipy` - Signalverarbeitung -- `scikit-learn` - Feature-Skalierung & ML -- `pygazeanalyser` - Eye-Tracking Analyse -- `pyarrow` - Parquet I/O +## Python Version -## 💻 Usage +Recommended: +- Python `3.10` to `3.12` -### 1. Feature-Extraktion - to be continued \ No newline at end of file +## Core Dependencies + +```bash +pip install numpy pandas scipy scikit-learn pyarrow pyyaml joblib paho-mqtt matplotlib +``` + +## Computer Vision / Eye Tracking / AU Stack + +```bash +pip install opencv-python mediapipe torch moviepy +pip install pygazeanalyser +pip install py-feat +``` + +## Data Access (optional) + +```bash +pip install pyocclient h5py tables +``` + +## Notes +- `tensorflow` is required for `.keras` model inference in `predict_sample.py`. +- `py-feat`, `mediapipe`, and `torch` can be platform-sensitive; pin versions per your target machine. + +## Configuration + +Primary runtime config: +- `predict_pipeline/config.yaml` + +Sections: +- `database`: SQLite path/table/key +- `model`: model file path +- `scaler`: scaling toggle + scaler path +- `mqtt`: broker connection + publish format +- `sample.columns`: expected feature order +- `fallback`: median/default feature values used for NaN replacement + +Before running prediction, verify all absolute paths in `config.yaml`. + +## Quick Start + +## A) Build Training Dataset (Offline) + +1. Set input/output paths in: +- `dataset_creation/parquet_file_creation.py` +- `dataset_creation/combined_feature_creation.py` + +2. Generate subject Parquet files: +```bash +python dataset_creation/parquet_file_creation.py +``` + +3. Generate combined sliding-window feature dataset: +```bash +python dataset_creation/combined_feature_creation.py +``` + +## B) Run Prediction Once + +1. Update paths in `predict_pipeline/config.yaml`. +2. Run: +```bash +python predict_pipeline/predict_sample.py +``` + +## C) Run as systemd Service + Timer (Linux) + +1. Copy unit files to `/etc/systemd/system/`. +2. Adjust `ExecStart` and user in `predict.service`. +3. Enable and start timer: +```bash +sudo systemctl daemon-reload +sudo systemctl enable predict.timer +sudo systemctl start predict.timer +``` + +Monitor logs: +```bash +journalctl -u predict.service -f +``` + +## Database and Table Expectations + +The prediction script expects a SQLite table with at least: +- `_Id` +- `start_time` +- all model feature columns listed in `config.yaml` under `sample.columns` + +The camera pipeline writes feature rows into `feature_table` using helper utilities in: +- `dataset_creation/camera_handling/db_helper.py` +- `tools/db_helpers.py` + + + + +## License + +No license file is currently present in this repository. +Add a `LICENSE` file if this project should be shared or reused externally. diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..7272bf9 --- /dev/null +++ b/requirements.txt @@ -0,0 +1,26 @@ +# Core data + ML utilities +numpy +pandas +scipy +scikit-learn +pyarrow +joblib +PyYAML +matplotlib + +# Prediction pipeline +paho-mqtt +tensorflow + +# Camera / feature extraction stack +opencv-python +mediapipe +torch +moviepy +pygazeanalyser +py-feat + +# Data ingestion (ownCloud + HDF) +pyocclient +h5py +tables