From 96b3e35248a3e290476a164bdca0c60ef5318e00 Mon Sep 17 00:00:00 2001 From: Celina Date: Thu, 19 Mar 2026 18:44:00 +0100 Subject: [PATCH] changes to CNN report --- project_report.md | 89 ++++++++++++++++++++++++++++------------------- 1 file changed, 54 insertions(+), 35 deletions(-) diff --git a/project_report.md b/project_report.md index 8d8abd5..782126c 100644 --- a/project_report.md +++ b/project_report.md @@ -123,7 +123,7 @@ Supporting utilities in ```model_training/tools```: ### 4.1 CNNs This section summarizes all CNN‑based supervised learning approaches implemented in the project. -All models operate on **facial Action Unit (AU)** features and, depending on the notebook, additional **eye‑tracking features**. +All models operate on facial Action Unit (AU) features and, depending on the notebook, additional eye‑tracking features. The notebooks differ in evaluation methodology, fusion strategy, and experimental intention. ### 4.1.1 Baseline CNN (Notebook: *CNN_simple*) @@ -132,17 +132,17 @@ The model uses two convolutional layers, batch normalization, max pooling, and a A single subject‑exclusive train/validation/test split is used. The intention behind this notebook is to: -- Provide a **baseline performance level** +- Provide a baseline performance level - Validate that AU features contain discriminative information - Identify overfitting tendencies before moving to more rigorous evaluation ### 4.1.2 Cross‑Validated CNN (Notebook: *CNN_crossVal*) -This notebook introduces **5‑fold GroupKFold cross‑validation**, ensuring subject‑exclusive folds. +This notebook introduces 5‑fold GroupKFold cross‑validation, ensuring subject‑exclusive folds. The architecture is similar to the baseline but includes stronger regularization and a lower learning rate. The intention behind this notebook is to: -- Provide **robust generalization estimates** +- Provide robust generalization estimates - Reduce variance caused by single‑split evaluation - Establish a cross‑validated AU‑only benchmark @@ -151,13 +151,14 @@ The intention behind this notebook is to: This notebook is a streamlined version of the previous one, removing unused eye‑tracking features and focusing exclusively on AUs. The intention behind this notebook is to: -- Provide a **clean AU‑only benchmark** +- Provide a clean AU‑only benchmark - Improve reproducibility and interpretability - Prepare for multimodal comparisons -### 4.1.4 Cross‑Validated CNN with Early Fusion (AUs + Eye Features) (Notebook: *CNN_crossVal_faceAUs_eyeFeatures*) -This notebook introduces **early fusion**, concatenating AU and eye‑tracking features into a single input vector. +### 4.1.4 Cross‑Validated CNN with Early Fusion (AUs + Eye Features) +(Notebook: *CNN_crossVal_faceAUs_eyeFeatures*) +This notebook introduces early fusion, concatenating AU and eye‑tracking features into a single input vector. The architecture remains identical to AU‑only models. The intention behind this notebook is to: @@ -171,7 +172,7 @@ This notebook didn't lead to any useful results. This notebook refines the early‑fusion approach by removing samples with missing values and ensuring consistent multimodal input quality. The intention behind this notebook is to: -- Provide a **clean and fully validated early‑fusion model** +- Provide a clean and fully validated early‑fusion model - Investigate multimodal complementarity under rigorous CV - Improve interpretability through aggregated confusion matrices @@ -180,68 +181,86 @@ The intention behind this notebook is to: This notebook applies domain‑specific filtering to isolate a more homogeneous subset of cognitive states before training. The intention behind this notebook is to: -- Evaluate whether **subset filtering** improves multimodal learning +- Evaluate whether subset filtering improves multimodal learning - Reduce dataset heterogeneity - Provide a controlled multimodal benchmark -### 4.1.7 Hybrid‑Fusion CNN -(Notebook: *CNN_crossVal_HybridFusion*) -This notebook introduces a **hybrid‑fusion architecture** with two modality‑specific branches: +### 4.1.7 Hybrid‑Fusion CNN (Notebook: *CNN_crossVal_HybridFusion*) +This notebook introduces a hybrid‑fusion architecture with two modality‑specific branches: - A 1D CNN for AUs - A dense MLP for eye‑tracking features The branches are fused before classification. The intention behind this notebook is to: -- Allow each modality to learn **specialized representations** +- Allow each modality to learn specialized representations - Evaluate whether hybrid fusion outperforms early fusion - Provide a strong multimodal benchmark -### 4.1.8 Early‑Fusion CNN with Independent Test Evaluation -(Notebook: *CNN_crossVal_EarlyFusion_Test_Eval*) -This notebook introduces the first **true held‑out test evaluation** for an early‑fusion CNN. +### 4.1.8 Early‑Fusion CNN with Independent Test Evaluation (Notebook: *CNN_crossVal_EarlyFusion_Test_Eval*) +This notebook introduces the first true held‑out test evaluation for an early‑fusion CNN. A subject‑exclusive train/test split is created before cross‑validation. The intention behind this notebook is to: -- Provide a **deployment‑realistic performance estimate** +- Provide a deployment‑realistic performance estimate - Compare validation‑fold behavior with true test‑set behavior - Visualize ROC and PR curves for threshold analysis | Metric / Model | CNN_crossVal_EarlyFusion_Test_Eval | |----------------|-------------------------------------| -| Test Accuracy | | -| Test F1 | | -| Test AUC | | -| Balanced Accuracy | | -| Precision | | -| Recall | | +| Test Accuracy | 0.913 | +| Test F1 | 0.927 | +| Test AUC | 0.967 | +| Balanced Accuracy | 0.907 | +| Precision | 0.918 | +| Recall | 0.937 | -### 4.1.9 Hybrid‑Fusion CNN with Independent Test Evaluation -(Notebook: *CNN_crossVal_HybridFusion_Test_Eval*) +#### Confusion Matrix +![Konfusionsmatrix](results/Konfusionsmatrix_EarlyFusion.png) + +*Figure 4.1.8.1: Confusion matrix of the Early‑Fusion model.* + +#### ROC-Curve +![ROC-Kurve](results/ROC_EarlyFusion.png) + +*Figure 4.1.8.2: ROC-Curve of the Early‑Fusion model.* + +### 4.1.9 Hybrid‑Fusion CNN with Independent Test Evaluation (Notebook: *CNN_crossVal_HybridFusion_Test_Eval*) This notebook extends hybrid fusion with a subject‑exclusive train/test split and full test‑set evaluation. The intention behind this notebook is to: -- Evaluate hybrid fusion under **realistic deployment conditions** +- Evaluate hybrid fusion under realistic deployment conditions - Compare hybrid vs. early fusion on unseen subjects - Provide full diagnostic plots (ROC, PR, confusion matrices) | Metric / Model | CNN_crossVal_HybridFusion_Test_Eval | |----------------|--------------------------------------| -| Test Accuracy | | -| Test F1 | | -| Test AUC | | -| Balanced Accuracy | | -| Precision | | -| Recall | | +| Test Accuracy | 0.950 | +| Test F1 | 0.959 | +| Test AUC | 0.983 | +| Balanced Accuracy | 0.942 | +| Precision | 0.933 | +| Recall | 0.986 | + +#### Confusion Matrix +![Konfusionsmatrix](results/Konfusionsmatrix_HybridFusion.png) + +*Figure 4.1.9.1: Confusion matrix of the Hybrid‑Fusion model.* + +#### ROC-Curve +![ROC-Kurve](results/ROC_HybridFusion.png) + +*Figure 4.1.9.2: ROC-Curve of the Hybrid‑Fusion model.* ### 4.1.10 Summary Across all nine notebooks, the project progresses from a simple AU‑only baseline to advanced multimodal hybrid‑fusion architectures with independent test evaluation. -This progression reflects increasing methodological rigor and prepares the foundation for selecting a final deployment model. -Ultimately, the experiments showed that **early fusion and hybrid fusion perform very similarly**, with no substantial performance advantage for either approach. -Furthermore, even when relying **solely on facial Action Unit data**, the models achieve **strong and competitive results**, indicating that AUs alone already capture a significant portion of the cognitive workload signal. +The final experiments revealed that hybrid fusion provides a measurable performance advantage over early fusion. While both approaches achieve strong results, the hybrid‑fusion model reaches higher overall accuracy (95% vs. 91.3%) and substantially stronger recall (98.6% vs. 93.7%), indicating that it is more effective at correctly identifying high‑workload samples. +Early fusion, however, shows slightly better precision, suggesting that it produces fewer false positives. + +Looking ahead, further improvements could likely be achieved through more extensive hyperparameter tuning, as the current results suggest that additional optimization headroom remains. ### 4.2 XGBoost This documentation outlines the evolution of the XGBoost classification pipeline for cognitive workload detection. The project transitioned from a basic unimodal setup to a sophisticated, multi-stage hybrid system incorporating advanced statistical filtering and deep feature extraction.