Fahrsimulator_MSY2526_AI/project_report.md

# Project Report: Multimodal Driver State Analysis

## 1) Project Scope

This repository implements an end-to-end workflow for multimodal driver-state analysis in a simulator setup.
The system combines:
- Facial Action Units (AUs)
- Eye-tracking features (fixations, saccades, blinks, pupil behavior)

Apart from this, several machine learning model architectures are presented and evaluated.

Content:
- Dataset generation
- Exploratory data analysis
- Model training experiments
- Real-time inference with SQlite, systemd and MQTT
- Repository file inventory
- Additional nformation


## 2) Dataset generation

### 2.1 Data Access, Filtering, and Data Conversion

Main scripts:
- `dataset_creation/create_parquet_files_from_owncloud.py`
- `dataset_creation/parquet_file_creation.py`

Purpose:
- Download and/or access dataset files (either download first via ```EDA/owncloud_file_access.ipynb``` or all in one with ```dataset_creation/create_parquet_files_from_owncloud.py```
- Keep relevant columns (FACE_AUs and eye-tracking raw values)
- Filter invalid samples (e.g., invalid level segments): Make sure not to drop rows where NaN is necessary for later feature creation, therefore use subset argument in dropNa()!
- Export subject-level parquet files
- Before running the scripts: be aware that the whole dataset contains 30 files with around 900 Mbytes each, provide enough storage and expect this to take a while.


### 2.2 Feature Engineering (Offline)

Main script:
- `dataset_creation/combined_feature_creation.py`

Behavior:
- Builds fixed-size sliding windows over subject time series (window size and step size can be adjusted)
- Uses prepared parquet files from 2.1
- Aggregates AU statistics per window (e.g., `FACE_AUxx_mean`)
- Computes eye-feature aggregates (fix/sacc/blink/pupil metrics)
- Produces training-ready feature tables = dataset
- Parameter ```MIN_DUR_BLINKS``` can be adjusted, although this value needs to make sense in combination with your sampling frequency
- With low videostream rates, consider to reevaluate the meaningfulness of some eye-tracking features, especially the fixations
- running the script requires a manual installation of [pygaze Analyser library](https://github.com/esdalmaijer/PyGazeAnalyser.git) from github

### 2.3 Online Camera + Eye + AU Feature Extraction

Main scripts:
- `dataset_creation/camera_handling/camera_stream_AU_and_ET_new.py`
- `dataset_creation/camera_handling/eyeFeature_new.py`
- `dataset_creation/camera_handling/db_helper.py`

Runtime behavior:
- Captures webcam stream with OpenCV
- Extracts gaze/iris-based signals via MediaPipe
- Records overlapping windows (`VIDEO_DURATION=50s`, `START_INTERVAL=5s`, `FPS=25`)
- Runs AU extraction (`py-feat`) from recorded video segments
  - Explanation of the py-feat functionality is located in `dataset_creation/AU_creation/pyfeat_docu.ipynb`
- Computes eye-feature summary from generated gaze parquet
- Writes merged rows to SQLite table `feature_table`

Operational note:
- `DB_PATH` and other paths are currently code-configured and must be adapted per deployment.

## 3) EDA
The directory EDA provides several files to get insights into both the raw data from AdaBase and your own dataset.

- `EDA.ipynb` - Main EDA notebook: recreates the plot from AdaBase documentation, lists all experiments and in general serves as a playground for you to get to know the files.
- `distribution_plots.ipynb` - This notebook aimes to visualize the data distributions for each experiment - the goal is the find out, whether the split of experiments into high and low cognitive load is clearer if some experiments are dropped.
- `histogramms.ipynb` - Histogram analysis of low load vs high load per feature. Additionaly, scatter plots per feature are available.
- `researchOnSubjectPerformance.ipynb` - This noteboooks aims to see how the performance values range for the 30 subjects. The code creates and saves a table in csv-format, which will later be used as the foundation of the performance based split in ```model_training/tools/performance_based_split```
- `owncloud_file_access.ipynb` - Get access to the files via owncloud and safe them as .h5 files, in correspondence to the parquet file creation script
- `login.yaml` -Used to store URL and password to access files from owncloud, used in previous notebook
- `calculate_replacement_values.ipynb` -Fallback / median computation notebook for deployment, creation of yaml syntax embedding

General information:
- Due to their size, its absolutely recommended to download and save the dataset files once in the beginning
- For better data understanding, read the [AdaBase publication](https://www.mdpi.com/1424-8220/23/1/340)


## 4) Model Training

Included model families:
- CNN variants (different fusion strategies)
- XGBoost
- Isolation Forest*
- OCSVM*
- DeepSVDD*

\* These training strategies are unsupervised, which means only low cognitive load samples are used for training. Validation then also considers high low samples.


Supporting utilities in ```model_training/tools```:
- `scaler.py`: Functions to fit, transform, save and load either MinMaxScaler or StandardScaler, subject-wise and globally - for new subjects, a fallback scaler (using mean of all subjects scaling parameters) is used
- `performance_split.py`: Provides a function to split a group of subjects based on their performance in the AdaBase experiments, based on the results created in `researchOnSubjectPerformance.ipynb`. To split into three groups for train, validation & test, call the function twice
- `mad_outlier_removal.py`: Functions to fit and transform data with MAD outlier removal
- `evaluation_tools.py`: Especially used for Isolation Forest, Functions for ROC curve as well as confusion matrix

### 4.1 CNNs
### 4.2 XGBoost
This documentation outlines the evolution of the XGBoost classification pipeline for cognitive workload detection. The project transitioned from a basic unimodal setup to a sophisticated, multi-stage hybrid system incorporating advanced statistical filtering and deep feature extraction.
During the model creation several methods were used to improve the model accuracy. During training, the biggest challenge was always the high overfitting of the model. Even in the last version with explicit regulation parameters the overall accuracy couldn't be improved more than the different methods before.
The model overall was not that good, as the highest accuracy we could achieve was around 65%, which is a bit higher than Fraunhofer achieved in the ADABase-Paper.

### 4.2.1 Classical XGBoost Baseline

To establish a performance baseline, a classical Extreme Gradient Boosting (XGBoost) model was implemented. XGBoost was selected for its ability to handle non-linear relationships and its inherent regularization, which helps prevent overfitting in high-dimensional feature spaces like Facial Action Units. XGBoost was picked because of its usage in the ADABase Paper. Initially, the model utilized raw Action Unit sums with global normalization to determine the basic predictability of workload from facial muscle activity alone.

| Metric / Model | Classical XGBoost |
| --- | --- |
| Accuracy |  |
| AUC |  |
| F1-Score |  |

### 4.2.2 XGBoost with GroupKFold Validation

To address the challenge of inter-subject variability, the validation strategy was upgraded to `GroupKFold`. In behavioral data, samples from the same subject are highly correlated. Standard cross-validation often leads to data leakage, where the model memorizes individual facial characteristics. By ensuring that a subject's data is never shared between the training and validation sets, this iteration provides a scientifically rigorous measure of how the model generalizes to entirely unseen individuals.

| Metric / Model | XGBoost (GroupKFold) |
| --- | --- |
| Accuracy |  |
| AUC |  |
| F1-Score |  |

### 4.2.3 Hybrid XGBoost with Autoencoder

To improve feature quality, a hybrid approach was introduced by pre-training a deep Autoencoder. The encoder branch was used to compress 20 raw Action Units into a 5-dimensional latent space. This non-linear dimensionality reduction aims to capture muscle synergies and filter out noise that decision trees might struggle with. The XGBoost classifier was then trained on these machine-learned representations rather than raw inputs.

| Metric / Model | XGBoost + Autoencoder |
| --- | --- |
| Accuracy |  |
| AUC |  |
| F1-Score |  |

### 4.2.4 Robust XGBoost with MAD Outlier Removal

Recognizing that physiological and AU data often contain sensor artifacts, a robust preprocessing layer was added using Median Absolute Deviation (MAD). Unlike standard deviation, MAD is resilient to extreme outliers. By calculating a Robust Z-score and filtering signals in the training set, the model learned from a "clean" representation of cognitive states, significantly improving the stability of the gradient boosting process.

| Metric / Model | XGBoost + MAD |
| --- | --- |
| Accuracy |  |
| AUC |  |
| F1-Score |  |

### 4.2.5 Combined Dataset of Action Units and EyeTracking

This iteration involved training, which refined a robust pipeline on a new, expanded dataset. This dataset integrated both high-frequency facial action units and advanced eye-tracking metrics (pupillometry and fixations).
Since the recreation of the EyeTracking data in the lab was in doubt, only Action Units were used in the first XGBoost models. Now the model also implemented the EyeTracking data as features.
By applying performance-based subject splitting, we ensured that the training and test sets were balanced not only by label but by the subjects' underlying skill levels, resulting in the most deployable version of the AI.

| Metric / Model | Final Combined Model |
| --- | --- |
| Accuracy |  |
| AUC |  |
| F1-Score |  |

### 4.2.6 Regularized XGBoost with Complexity Control

Building upon the robust preprocessing of the previous steps, this iteration focuses on strict **complexity control** within the XGBoost architecture. To mitigate the 100% training accuracy observed in earlier unimodal tests—a clear indicator of overfitting—we introduced explicit **L1 (reg_alpha)** and **L2 (reg_lambda)** regularization parameters into the GridSearch space.

By penalizing large weights and promoting feature sparsity, the model is forced to prioritize the most globally relevant Action Units. Furthermore, the tree depth was intentionally restricted (`max_depth`: 2-4), and an **Early Stopping** callback with a 30-round patience window was implemented. This ensures that training terminates at the point of optimal generalization, capturing the essential physiological trends of cognitive load while ignoring subject-specific noise.

| Metric / Model | Regularized XGBoost |
| --- | --- |
| Accuracy |  |
| AUC |  |
| F1-Score |  |

### 4.3 Isolation Forest
To start with unsupervised learning techniques, `IsolationForest.ipynb`was created to research how well a simple ensemble classificator performs on the created dataset.
The notebook comes with one class grid search for hyperparameter tuning as well as a ROC curve that allows manual fine tuning.
Overall, our experiments have shown, that this approach is not sufficient, with the following results:
| Metric / Model | Isolation Forest |
|----------------|---------|
| Best Balanced Accuracy |0.57|
| Best AUC | 0.61|

In detail, the classificator tends to classify the majority of samples as low load and is therefore not sufficient to be used for later deployment.


### 4.4 One Class SVM with Autoencoder
The training of an On Class SVM on the data from the dataset resulted in every sample was predicted as an anomaly.
In the next step, an autoencoder is pretrained to learn representation of the data. Afterwards, the encoder is used for preprocessing, which leads to OCSVM training on encoder output.
The training includes hyperparameter tuning through gridsearch cv.
Encoder output is visualized with print statements and plots that show the encoded data for both low and high load samples.
We see that the encoder struggles to represent the unseen high load samples differently. As a consequence, the One Class SVM also does not achieve sufficient performance.
| Metric / Model | One Class SVM |
|----------------|---------|
| Best Balanced Accuracy |0.62|

When the notebook is run completely, both the trained encoder and svm are saved for later use given the save paths are set correctly.
### 4.5 Deep SVDD
Similar to the OCSVM training, an autoencoder is used to preprocess the data before the actual Deep SVDD training. Nevertheless, the usage is partialy different. The Dee SVDD uses a pretrained encoder to fine tune it by apllying a different loss function (which results from the theoretical concept behind Deep SVDD). This means that the encoder weights are still modified in the actual Deep SVDD training.
Also, this approach includes **hybrid fusion of modalities**. Instead of putting all features into the same input layer, the neural network is divided into two branches, that process action units and eye-tracking features separately.
Then, after two Dense layers each, the branches are fusioned by concatenation. From there, another two Dense layers process the data.
The decoder is not exactly similar, as the split of the modalities happens are the very end.
To compute the total loss, loss from both modalities is combined by sum. Users are able to change loss weights. Training includes 2x2 phases, both autoencoder and later Deep SVDD are first trained with larger learning rate, then fine tuned with a smaller learning rate.

| Metric / Model | Deep SVDD |
|----------------|---------|
| Best Balanced Accuracy |0.60|
| Best AUC | 0.57|

### 4.6 General information on unsupervised approaches
As described above, the approachs didn't meet the requirements in terms of prediction performance. For all models, both MinMax-Scaling as well as Standard-Scaling was done. Also, both subjectwise and globally. Unfortunately, the differences were not that large, which may explain why preprocessing wasn't mentioned above.

Future research should always keep in mind while subject-wise scaling might be better for training, it makes deployment on new subjects more difficult. Our solution, as implemented in `model_training/tools/scaler.py` calculates a fallback scaler (using mean of all subjects scaling parameters).

## 5) Real-Time Prediction and Messaging

Main script:
- `predict_pipeline/predict_sample.py`

Pipeline:
- Loads runtime config (`predict_pipeline/config.yaml`)
- Pulls latest row from SQLite
- Replaces missing values using `fallback` map from config file - if more than 50% of values need to be replaced, the sample is dropped and "valid=False"
- Optionally applies scaler (`.pkl`/`.joblib`) - set via config file
- Loads model (`.keras`, `.pkl`, `.joblib`) and predicts
- Publishes JSON payload to MQTT topic

Expected payload form:
```json
{
  "valid": true,    # false only if too many signals are invalid
  "_id": 123,       # this is the sample ID from the database
  "prediction": 0   # 0 for low load, 1 for high load
}
```

### 5.1 Scheduled Prediction (Linux)

Files:
- `predict_pipeline/predict.service`
- `predict_pipeline/predict.timer`

Role:
- Run inference repeatedly without manual execution
- Timer/service configuration can be customized

More information on how to use and interact with the system service and timer can be found in [predict_service_timer_documentation.md](/predict_pipeline/predict_service_timer_documentation.md)

## 5.2 Runtime Configuration

Primary config file:
- `predict_pipeline/config.yaml`

Sections:
- `database`: SQLite location + table + sort key
- `model`: model path
- `scaler`: scaler usage + path
- `mqtt`: broker and publish format
- `sample.columns`: expected feature order
- `fallback`: default values for NaN replacement

Important:
- The repository currently uses environment-specific absolute paths in some scripts/configs to ensure functionality on Ohm-UX driving simulator.


## 5.3 Data and Feature Expectations

Prediction expects SQLite rows containing:
- `_Id`
- `start_time` - this is not yet used for either predictions or messages
- All configured model features (AUs + eye metrics)

Common feature groups (similar to own dataset):
- `FACE_AUxx_mean` columns
- Fixation counters and duration statistics
- Saccade count/amplitude/duration statistics
- Blink count/duration statistics
- Pupil mean and IPA

## 5.4 Create database from scratch
To (re-)create the custom database for deployment, use `fill_db.ipynb`. Enter the path to your dataset, drop unnecessary columns and insert a subset of data with tool functions from `tools/db_helpers`

## 6) Installation and Dependencies
Due to unsolvable dependency conflicts, several environemnts need to be used in the same time.
### 6.1 Environemnt for camera handling
The setup of a virtual environment for the camera handling is difficult due to vary dependency conflicts.
Therefore it is necessary to create the virtual environment with every package in the specific version and each package in the specific order.
Furthermore the environment needs to be based on Python 3.10. The specific versions and order of the packages are described int the file:
`requirements.txt`


### 6.2 Environment for predictions
If you want to use the existing deployment on Ohm-UX driving simulator's jetson board, activate conda environment `p310_FS_TF`, a python 3.10 environment including tensorflow and all other packages required to run `predict_sample.py`
Otherwise, as described in `readme.md: Setup`, you can use `prediction_env.yaml`, to create a new environment that fulfills the requirements.

## 7) Repository File Inventory

### Root

- `.gitignore` - Git ignore rules
- `readme.md` - minimal quickstart documentation
- `project_report.md` - full technical documentation (this file)
- `requirements.txt` - Python dependencies

### Dataset Creation

- `dataset_creation/parquet_file_creation.py` - local files to parquet conversion
- `dataset_creation/create_parquet_files_from_owncloud.py` - ownCloud download + parquet conversion
- `dataset_creation/combined_feature_creation.py` - sliding-window multimodal feature generation
- `dataset_creation/maxDist.py` - helper/statistical utility script for eye-tracking feature creation

#### AU Creation
- `dataset_creation/AU_creation/pyfeat_docu.ipynb` - py-feat exploratory notes

#### Camera Handling
- `dataset_creation/camera_handling/camera_stream_AU_and_ET_new.py` - current camera + AU + eye online pipeline
- `dataset_creation/camera_handling/eyeFeature_new.py` - eye-feature extraction from gaze parquet
- `dataset_creation/camera_handling/db_helper.py` - SQLite helper functions (camera pipeline)
- `dataset_creation/camera_handling/camera_stream.py` - baseline camera streaming script
- `dataset_creation/camera_handling/db_test.py` - DB test utility

### EDA

- `EDA/EDA.ipynb` - main EDA notebook
- `EDA/distribution_plots.ipynb` - distribution visualization
- `EDA/histogramms.ipynb` - histogram analysis
- `EDA/researchOnSubjectPerformance.ipynb` - subject-level analysis
- `EDA/owncloud_file_access.ipynb` - ownCloud exploration/access notebook
- `EDA/calculate_replacement_values.ipynb` - fallback/median computation notebook
- `EDA/login.yaml` - local auth/config artifact for EDA workflows

### Model Training

#### CNN
- `model_training/CNN/CNN_simple.ipynb`
- `model_training/CNN/CNN_crossVal.ipynb`
- `model_training/CNN/CNN_crossVal_EarlyFusion.ipynb`
- `model_training/CNN/CNN_crossVal_EarlyFusion_Filter.ipynb`
- `model_training/CNN/CNN_crossVal_EarlyFusion_Test_Eval.ipynb`
- `model_training/CNN/CNN_crossVal_faceAUs.ipynb`
- `model_training/CNN/CNN_crossVal_faceAUs_eyeFeatures.ipynb`
- `model_training/CNN/CNN_crossVal_HybridFusion.ipynb`
- `model_training/CNN/CNN_crossVal_HybridFusion_Test_Eval.ipynb`
- `model_training/CNN/deployment_pipeline.ipynb`

#### XGBoost
- `model_training/xgboost/xgboost.ipynb`
- `model_training/xgboost/xgboost_groupfold.ipynb`
- `model_training/xgboost/xgboost_new_dataset.ipynb`
- `model_training/xgboost/xgboost_regulated.ipynb`
- `model_training/xgboost/xgboost_with_AE.ipynb`
- `model_training/xgboost/xgboost_with_MAD.ipynb`

#### Isolation Forest
- `model_training/IsolationForest/iforest_training.ipynb`

#### OCSVM
- `model_training/OCSVM/ocsvm_with_AE.ipynb`

#### DeepSVDD
- `model_training/DeepSVDD/deepSVDD.ipynb`

#### MAD Outlier Removal
- `model_training/MAD_outlier_removal/mad_outlier_removal.ipynb`
- `model_training/MAD_outlier_removal/mad_outlier_removal_median.ipynb`

#### Shared Training Tools
- `model_training/tools/scaler.py`
- `model_training/tools/performance_split.py`
- `model_training/tools/mad_outlier_removal.py`
- `model_training/tools/evaluation_tools.py`

### Prediction Pipeline

- `predict_pipeline/predict_sample.py` - runtime prediction + MQTT publish
- `predict_pipeline/config.yaml` - runtime database/model/scaler/mqtt config
- `predict_pipeline/fill_db.ipynb` - helper notebook for DB setup/testing
- `predict_pipeline/predict.service` - systemd service unit
- `predict_pipeline/predict.timer` - systemd timer unit
- `predict_pipeline/predict_service_timer_documentation.md` - Linux service/timer guide

### Generic Tools

- `tools/db_helpers.py` - common SQLite utilities used to get newest sample for prediction

## 8) Additional Information

- Several paths are hardcoded on purpose to ensure compability with the jetsonboard at the OHM-UX driving simulator.
- Camera and AU processing are resource-intensive; version pinning and hardware validation are recommended.