chapter EDA written
This commit is contained in:
parent
2ec0af5f62
commit
b252082991
@ -23,7 +23,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"file_path = \"adabase-public-0020-v_0_0_2.h5py\""
|
"file_path = \"YOUR_FILE_PATH.h5py\""
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -87,7 +87,7 @@
|
|||||||
"id": "a4731c56",
|
"id": "a4731c56",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Actions units"
|
"Insights on actions units"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -167,7 +167,7 @@
|
|||||||
"id": "332740a8",
|
"id": "332740a8",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Plots"
|
"Example plot of ECG curve"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -177,7 +177,6 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# df_signals_ecg = pd.read_hdf(file_path, \"SIGNALS\", mode=\"r\", columns=[\"STUDY\",\"LEVEL\", \"PHASE\", 'RAW_ECG_I'])\n",
|
|
||||||
"df_signals_ecg = df_signals[[\"STUDY\",\"LEVEL\", \"PHASE\", 'RAW_ECG_I']]\n",
|
"df_signals_ecg = df_signals[[\"STUDY\",\"LEVEL\", \"PHASE\", 'RAW_ECG_I']]\n",
|
||||||
"df_signals_ecg.shape"
|
"df_signals_ecg.shape"
|
||||||
]
|
]
|
||||||
|
|||||||
@ -37,7 +37,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"dataset_path = Path(r\"\")"
|
"dataset_path = Path(r\"\") # TODO: enter path to dataset"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@ -36,7 +36,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"path = Path(r\"/home/jovyan/data-paulusjafahrsimulator-gpu/new_datasets/50s_25Hz_dataset.parquet\")\n",
|
"path = Path(r\".parquet\") # TODO: enter path to dataset\n",
|
||||||
"df = pd.read_parquet(path=path)"
|
"df = pd.read_parquet(path=path)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@ -192,18 +192,6 @@
|
|||||||
"display_name": "Python 3 (ipykernel)",
|
"display_name": "Python 3 (ipykernel)",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
"name": "python3"
|
"name": "python3"
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.12.10"
|
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
@ -155,7 +155,7 @@
|
|||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3 (ipykernel)",
|
"display_name": "310",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
"name": "python3"
|
"name": "python3"
|
||||||
},
|
},
|
||||||
@ -169,7 +169,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.12.10"
|
"version": "3.10.19"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
@ -68,21 +68,23 @@ Operational note:
|
|||||||
- `DB_PATH` and other paths are currently code-configured and must be adapted per deployment.
|
- `DB_PATH` and other paths are currently code-configured and must be adapted per deployment.
|
||||||
|
|
||||||
## 3) EDA
|
## 3) EDA
|
||||||
TO DO
|
The directory EDA provides several files to get insights into both the raw data from AdaBase and your own dataset.
|
||||||
|
|
||||||
|
- `EDA.ipynb` - main EDA notebook: recreates the plot from AdaBase documentation, lists all experiments and in general serves as a playground for you to get to know the files.
|
||||||
|
- `distribution_plots.ipynb` - This notebook aimes to visualize the data distributions for each experiment - the goal is the find out, whether the split of experiments into high and low cognitive load is clearer if some experiments are dropped.
|
||||||
|
- `histogramms.ipynb` - Histogram analysis of low load vs high load per feature. Additionaly, scatter plots per feature are available.
|
||||||
|
- `researchOnSubjectPerformance.ipynb` - This noteboooks aims to see how the performance values range for the 30 subjects. The code creates and saves a table in csv-format, which will later be used as the foundation of the performance based split in ```model_training/tools/performance_based_split```
|
||||||
|
- `owncloud_file_access.ipynb` - Get access to the files via owncloud and safe them as .h5 files, in correspondence to the parquet file creation script
|
||||||
|
- `login.yaml` - used to store URL and password to access files from owncloud, used in previous notebook
|
||||||
|
- `calculate_replacement_values.ipynb` - fallback / median computation notebook for deployment, creation of yaml syntax embedding
|
||||||
|
|
||||||
|
General information:
|
||||||
|
- Due to their size, its absolutely recommended to download and save the dataset files once in the beginning
|
||||||
|
- For better data understanding, read the [AdaBase publication](https://www.mdpi.com/1424-8220/23/1/340)
|
||||||
|
|
||||||
- `EDA/EDA.ipynb` - main EDA notebook
|
|
||||||
- `EDA/distribution_plots.ipynb` - distribution visualization
|
|
||||||
- `EDA/histogramms.ipynb` - histogram analysis
|
|
||||||
- `EDA/researchOnSubjectPerformance.ipynb` - subject-level analysis
|
|
||||||
- `EDA/owncloud_file_access.ipynb` - ownCloud exploration/access notebook
|
|
||||||
- `EDA/calculate_replacement_values.ipynb` - fallback/median computation notebook
|
|
||||||
- `EDA/login.yaml` - local auth/config artifact for EDA workflows
|
|
||||||
|
|
||||||
## 4) Model Training
|
## 4) Model Training
|
||||||
|
|
||||||
Location:
|
|
||||||
- `model_training/` (primarily notebook-driven)
|
|
||||||
|
|
||||||
Included model families:
|
Included model families:
|
||||||
- CNN variants (different fusion strategies)
|
- CNN variants (different fusion strategies)
|
||||||
- XGBoost
|
- XGBoost
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user