upload CNN report

This commit is contained in:
Celina Korzer 2026-03-19 17:42:32 +01:00
parent 3701d11c77
commit 0483c3fea3

View File

@ -120,7 +120,129 @@ Supporting utilities in ```model_training/tools```:
- `mad_outlier_removal.py`: Functions to fit and transform data with MAD outlier removal
- `evaluation_tools.py`: Especially used for Isolation Forest, Functions for ROC curve as well as confusion matrix
### 4.1 CNNs
This section summarizes all CNNbased supervised learning approaches implemented in the project.
All models operate on **facial Action Unit (AU)** features and, depending on the notebook, additional **eyetracking features**.
The notebooks differ in evaluation methodology, fusion strategy, and experimental intention.
### 4.1.1 Baseline CNN (Notebook: *CNN_simple*)
The first notebook implements a simple 1D CNN to establish a baseline for AUonly classification.
The model uses two convolutional layers, batch normalization, max pooling, and a regularized dense head.
A single subjectexclusive train/validation/test split is used.
The intention behind this notebook is to:
- Provide a **baseline performance level**
- Validate that AU features contain discriminative information
- Identify overfitting tendencies before moving to more rigorous evaluation
### 4.1.2 CrossValidated CNN (Notebook: *CNN_crossVal*)
This notebook introduces **5fold GroupKFold crossvalidation**, ensuring subjectexclusive folds.
The architecture is similar to the baseline but includes stronger regularization and a lower learning rate.
The intention behind this notebook is to:
- Provide **robust generalization estimates**
- Reduce variance caused by singlesplit evaluation
- Establish a crossvalidated AUonly benchmark
### 4.1.3 CrossValidated CNN (Face AUs Only) (Notebook: *CNN_crossVal_faceAUs*)
This notebook is a streamlined version of the previous one, removing unused eyetracking features and focusing exclusively on AUs.
The intention behind this notebook is to:
- Provide a **clean AUonly benchmark**
- Improve reproducibility and interpretability
- Prepare for multimodal comparisons
### 4.1.4 CrossValidated CNN with Early Fusion (AUs + Eye Features) (Notebook: *CNN_crossVal_faceAUs_eyeFeatures*)
This notebook introduces **early fusion**, concatenating AU and eyetracking features into a single input vector.
The architecture remains identical to AUonly models.
The intention behind this notebook is to:
- Evaluate whether multimodal early fusion improves performance
- Establish a first multimodal baseline
- Analyze classspecific behavior via confusion matrices
This notebook didn't lead to any useful results.
### 4.1.5 CrossValidated CNN with Early Fusion (Refined Version) (Notebook: *CNN_crossVal_EarlyFusion*)
This notebook refines the earlyfusion approach by removing samples with missing values and ensuring consistent multimodal input quality.
The intention behind this notebook is to:
- Provide a **clean and fully validated earlyfusion model**
- Investigate multimodal complementarity under rigorous CV
- Improve interpretability through aggregated confusion matrices
### 4.1.6 CrossValidated CNN with Early Fusion and Subset Filtering (Notebook: *CNN_crossVal_EarlyFusion_Filter*)
This notebook applies domainspecific filtering to isolate a more homogeneous subset of cognitive states before training.
The intention behind this notebook is to:
- Evaluate whether **subset filtering** improves multimodal learning
- Reduce dataset heterogeneity
- Provide a controlled multimodal benchmark
### 4.1.7 HybridFusion CNN
(Notebook: *CNN_crossVal_HybridFusion*)
This notebook introduces a **hybridfusion architecture** with two modalityspecific branches:
- A 1D CNN for AUs
- A dense MLP for eyetracking features
The branches are fused before classification.
The intention behind this notebook is to:
- Allow each modality to learn **specialized representations**
- Evaluate whether hybrid fusion outperforms early fusion
- Provide a strong multimodal benchmark
### 4.1.8 EarlyFusion CNN with Independent Test Evaluation
(Notebook: *CNN_crossVal_EarlyFusion_Test_Eval*)
This notebook introduces the first **true heldout test evaluation** for an earlyfusion CNN.
A subjectexclusive train/test split is created before crossvalidation.
The intention behind this notebook is to:
- Provide a **deploymentrealistic performance estimate**
- Compare validationfold behavior with true testset behavior
- Visualize ROC and PR curves for threshold analysis
| Metric / Model | CNN_crossVal_EarlyFusion_Test_Eval |
|----------------|-------------------------------------|
| Test Accuracy | |
| Test F1 | |
| Test AUC | |
| Balanced Accuracy | |
| Precision | |
| Recall | |
### 4.1.9 HybridFusion CNN with Independent Test Evaluation
(Notebook: *CNN_crossVal_HybridFusion_Test_Eval*)
This notebook extends hybrid fusion with a subjectexclusive train/test split and full testset evaluation.
The intention behind this notebook is to:
- Evaluate hybrid fusion under **realistic deployment conditions**
- Compare hybrid vs. early fusion on unseen subjects
- Provide full diagnostic plots (ROC, PR, confusion matrices)
| Metric / Model | CNN_crossVal_HybridFusion_Test_Eval |
|----------------|--------------------------------------|
| Test Accuracy | |
| Test F1 | |
| Test AUC | |
| Balanced Accuracy | |
| Precision | |
| Recall | |
### 4.1.10 Summary
Across all nine notebooks, the project progresses from a simple AUonly baseline to advanced multimodal hybridfusion architectures with independent test evaluation.
This progression reflects increasing methodological rigor and prepares the foundation for selecting a final deployment model.
Ultimately, the experiments showed that **early fusion and hybrid fusion perform very similarly**, with no substantial performance advantage for either approach.
Furthermore, even when relying **solely on facial Action Unit data**, the models achieve **strong and competitive results**, indicating that AUs alone already capture a significant portion of the cognitive workload signal.
### 4.2 XGBoost
This documentation outlines the evolution of the XGBoost classification pipeline for cognitive workload detection. The project transitioned from a basic unimodal setup to a sophisticated, multi-stage hybrid system incorporating advanced statistical filtering and deep feature extraction.
During the model creation several methods were used to improve the model accuracy. During training, the biggest challenge was always the high overfitting of the model. Even in the last version with explicit regulation parameters the overall accuracy couldn't be improved more than the different methods before.