You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

Doc_BoxAlgorithm_ClassifierTrainer.dox-part 18KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285
  1. /**
  2. * \page BoxAlgorithm_ClassifierTrainer Classifier trainer
  3. __________________________________________________________________
  4. Detailed description
  5. __________________________________________________________________
  6. * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Description|
  7. The <em>Classifier Trainer</em> box is a generic box for training models to classify input data.
  8. It works in conjunction with the \ref Doc_BoxAlgorithm_ClassifierProcessor box.
  9. This box' role is to expose a generic interface to the rest of the BCI pipelines. The box
  10. will generate an internal structure according to the multiclass strategy and the learning
  11. algorithm selected.
  12. The behavior is simple, the box collects a number of feature vectors. Those feature vectors
  13. are labelled depending on the input they arrive on. When a specific stimulation arrives, a training
  14. process is triggered. This process can take some time so this box should be used offline. Depending on the
  15. settings you enter, you will be able to perform a k-fold test to estimate the accuracy of the learned
  16. classifier. When this training stimulation is received, the box generates a configuration file that will
  17. be usable online by the \ref Doc_BoxAlgorithm_ClassifierProcessor box.
  18. Finally, the box outputs a particular stimulation (OVTK_StimulationId_TrainCompleted)
  19. on its output, that can be used to trigger further treatments in the scenario.
  20. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Description|
  21. __________________________________________________________________
  22. Inputs description
  23. __________________________________________________________________
  24. * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Inputs|
  25. This box can have a variable number of inputs. If you need more than two classes, feel free to add more
  26. inputs and to use a proper strategy/classifier combination to handle more than two classes.
  27. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Inputs|
  28. * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Input1|
  29. The first input receives a stimulation stream. Only one stimulation of this stream is important, the one
  30. that triggers the training process. When this stimulation is received, all the feature vectors are labelled
  31. and sent to the classification algorithm. The training is triggered and executed. Then the classification
  32. algorithm generates a configuration file that will be used online by the \ref Doc_BoxAlgorithm_ClassifierProcessor box.
  33. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Input1|
  34. * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Input2|
  35. This input receives the feature vector for the first class.
  36. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Input2|
  37. * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Input3|
  38. This input receives the feature vector for the second class.
  39. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Input3|
  40. __________________________________________________________________
  41. Outputs description
  42. __________________________________________________________________
  43. * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Outputs|
  44. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Outputs|
  45. * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Output1|
  46. The stimulation OVTK_StimulationId_TrainCompleted is raised on this output when the classifier trainer has finished its job.
  47. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Output1|
  48. __________________________________________________________________
  49. Settings description
  50. __________________________________________________________________
  51. * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Settings|
  52. The number of settings of this box can vary depending on the classification algorithm you choose. Such algorithm
  53. could have specific input OpenViBE::Kernel::IParameter objects (see \ref OpenViBE::Kernel::IAlgorithmProxy for details). If
  54. the type of those parameters is simple enough to be handled in the GUI, then additional settings will be added to this box.
  55. <b>After switching a strategy or a classifier, you will have to close and re-open the settings configuration dialog to see the parameters of the new classifier.</b> Supported parameter types are : Integers, Floats, Enumerations, Booleans. The documentation for those
  56. parameters can not be done in this page because it is impossible to know at this time what classifier thus what hyper
  57. parameters you will have available. This will depend on the classification algorihtms that are be implemented in OpenViBE.
  58. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Settings|
  59. *
  60. * * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Setting1|
  61. The stimulation that triggers the training process and save the learned classifier to disk.
  62. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Setting1|
  63. *
  64. * * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Setting2|
  65. This setting points to the configuration file where to save the result of the training for later online use. This
  66. configuration file is used by the \ref Doc_BoxAlgorithm_ClassifierProcessor box. Its syntax
  67. depends on the selected algorithm.
  68. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Setting2|
  69. * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Setting3|
  70. This setting is the strategy to use. You can choose any registered \c OVTK_TypeId_ClassificationStrategy
  71. strategy you want.
  72. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Setting3|
  73. *
  74. * * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Setting4|
  75. This is the stimulation to send when the classifier algorithm detects a class-1 feature vector
  76. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Setting4|
  77. *
  78. * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Setting5|
  79. This is the stimulation to send when the classifier algorithm detects a class-2 feature vector
  80. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Setting5|
  81. * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Setting6|
  82. This setting is the classifier to use. You can choose any registered \c OVTK_TypeId_ClassifierAlgorithm
  83. algorithm you want.
  84. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Setting6|
  85. * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Setting10|
  86. If you want to perform a k-fold test, you should enter something else than 0 or 1 here. A k-fold test generally gives
  87. a better estimate of the classifiers accuracy than naive testing with the training data. The classifier may overfit
  88. the training data, and get a good accuracy with the observed data, but not be able to generalize to unseen data.
  89. In cross-validation, the idea is to divide the set of feature vectors in a number of partitions. The classification algorithm
  90. is trained on some of the partitions and its accuracy is tested on the others. However, the classifier produced by the box is
  91. the classifier trained with the whole data. The cross-validation is only an error estimation tool, it does not affect
  92. the resulting model. See the miscellaneous section for details on how the k-fold test is done in this box, and possible
  93. caveats about the cross-validation procedure.
  94. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Setting10|
  95. * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Setting11|
  96. If the number of class labels is unbalanced, the classifiers tend to be biased towards the majority labels.
  97. This option can be used to resample the dataset to feature all classes equally.
  98. The algorithm first looks how many examples there are in the majority class. Lets say this is n. Then, if class k has m examples,
  99. it will random sample n-m examples with replacement from class k, appending them to the dataset. This will be done for each class.
  100. In the end, each class will have n examples and all except the majority class will have some duplicate training vectors.
  101. This can be seen as a technique to weight the importance of examples for such classifiers that do not support setting example weights
  102. or class weight prior, and can in general be attempted with arbitrary learning algorithms.
  103. Enabling this option may make sense if the box is used for incremental learning, where all classes may not be equally represented
  104. in the training data obtained so far, even if the design itself is balanced. Note that enabling this will make the cross-validation
  105. results optimistic. In most conditions, the feature should be disabled.
  106. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Setting11|
  107. __________________________________________________________________
  108. Examples description
  109. __________________________________________________________________
  110. * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Examples|
  111. This box is used in BCI pipelines in order to classify cerebral activity states. For a detailed scenario using this
  112. box and its associated \ref Doc_BoxAlgorithm_ClassifierProcessor, please see the <b>motor imagary</b>
  113. BCI scenario in the sample scenarios. An even more simple tutorial with artificial data
  114. is available in the <b>box-tutorials/</b> folder.
  115. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Examples|
  116. __________________________________________________________________
  117. Miscellaneous description
  118. __________________________________________________________________
  119. * |OVP_DocBegin_BoxAlgorithm_ClassifierTrainer_Miscellaneous|
  120. The box supports various multiclass strategies and classifiers as plugins.
  121. \par Available strategy:
  122. Strategy refers to how feature vectors are routed to one or more classifiers, which possibly can handle only 2 classes themselves.
  123. \par Native
  124. Use the classifier training algorithm without a pairwise strategy. All the data is passed to a single classifier trainer.
  125. \par One Vs All
  126. Use a pairwise strategy which consists of training each class against all the others, creating n classifiers for n classes.
  127. \par One vs One
  128. Use a airwise strategy which trains one classifier for each pair of classes. Then we use a decision startegy to extract the most likely class. There are three differents decision strategy:
  129. \li Voting: method based on a simple majority voting process
  130. \li HT: method described in: Hastie, Trevor; Tibshirani, Robert. Classification by pairwise coupling. The Annals of Statistics 26 (1998), no. 2, 451--471
  131. \li PKPD: method describe in: Price, S. Knerr, L. Personnaz, and G. Dreyfus. Pairwise neural network classifiers with probabilistic outputs. In G. Tesauro, D. Touretzky, and T. Leen (eds.)
  132. Advances in Neural Information Processing Systems 7 (NIPS-94), pp. 1109-1116. MIT Press, 1995.
  133. You cannot use every algorithm with every decision strategy, but the interface will restain the choice according to your selection.
  134. \par Available classifiers:
  135. \par Support Vector Machine (SVM)
  136. A well-known classifier supporting non-linear classification via kernels. The implementation is based on LIBSVM 2.91, which is included in the OpenViBE source tree. The parameters exposed in the GUI correspond to LIBSVM parameters. For more information on LIBSVM, see <a href="http://www.csie.ntu.edu.tw/~cjlin/libsvm/">here</a>.
  137. \par
  138. This algorithm provides only probabilities.
  139. \par Linear Discriminant Analysis (LDA)
  140. A simple and fast linear classifier. For description, see any major textbook on Machine Learning or Statistics (e.g. Duda, Hart & Stork, or Hastie, Tibshirani & Friedman). This algorithm can be used with a regularized covariance matrix
  141. according to a method proposed by Ledoit & Wolf: "A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices", 2004.
  142. The Linear Discriminant Analysis has the following options.
  143. \par
  144. \li Use shrinkage: Use a classic or a regularized covariance matrix.
  145. \li Shrinkage: A value s between [0,1] sets a linear weight between dataCov and priorCov. I.e. cov=(1-s)*dataCov+s*priorCov.
  146. Value <0 is used to auto-estimate the shrinking coefficient (default). If var(x) is a vector of empirical variances of all data dimensions, priorCov is a
  147. diagonal matrix with a single value mean(var(x)) pasted on its diagonal. Used only if use shrinkage is checked.
  148. \li Force diagonal cov (DDA): This sets the nondiagonal entries of the covariance matrices to zero. Used only if Use shrinkage is checked.
  149. \par
  150. Note that setting shrinkage to 0 should get you the regular LDA behavior. If you additionally force the covariance to be diagonal, you should get a model resembling the Naive Bayes classifier.
  151. \par
  152. This algorithm provides both hyperplane distance and probabilities.
  153. \par Multilayer Perceptron (MLP)
  154. A classifier algorithm which relies on an artificial neural network (<a href="https://hal.inria.fr/inria-00099922/en">Laurent Bougrain. Practical introduction to artificial neural networks. IFAC symposium on automation in Mining, Mineral and Metal Processing -
  155. MMM'04, Sep 2004, Nancy, France, 6 p, 2004.</a>). In OpenViBE, the MLP is a 2-layer neural network. The hyperbolic tangent is the activation function of the
  156. neurons inside the hidden layer. The network is trained using the backpropagation of the gradient. During the training, 80% of the training set is used to compute the gradient,
  157. and 20% is used to validate the new model. The different weights and biases are updated only once per iteration (just before the validation). A coefficient alpha (learning coefficient) is used to moderate the importance of
  158. the modification of weights and biases to avoid oscillations. The learning stops when the difference of the error per element (computed during validation) of two consecutive iterations is under the value epsilon given as a parameter.
  159. \par
  160. \li Number of neurons in hidden layer: number of neurons that will be used in the hidden layer.
  161. \li Learning stop condition : the epsilon value used to stop the learning
  162. \li Learning coefficient: a coefficient which influence the speed of learning. The smaller the coefficient is, the longer the learning will take, the more chance you will have to get a good solution.
  163. \par
  164. Note that feature vectors are normalized between -1 and 1 (using the min/max of the training set) to avoid saturation of the hyperbolic tangent.
  165. \par
  166. This algorithm provides both hyperplane distance (identity of output layer) and probabilites (softmax function on output layer).
  167. \par Cross Validation
  168. In this section, we will detail how the k-fold test is implemented in this box. For the k-fold test to be performed, you
  169. have to choose more than 1 partition in the related settings. Suppose you chose \c n partitions. Then when trigger stimulation
  170. is received, the feature vector set is splitted in \c n consecutive segments. The classification algorithm is trained on
  171. \c n-1 of those segments and tested on the last one. This is performed for each segment.
  172. For example, suppose you have 5 partitions of feature vectors (\c FVs)
  173. \verbatim
  174. +------+ +------+ +------+ +------+ +------+
  175. | FVs1 | | FVs2 | | FVs3 | | FVs4 | | FVs5 |
  176. +------+ +------+ +------+ +------+ +------+
  177. \endverbatim
  178. For the first training, a feature vector set is built form the \c FVs2, \c FVs3, \c FVs4, \c FVs5. The classifier algorithm
  179. is trained on this feature vector set. Then the classifier is tested on the \c FVs1 :
  180. \verbatim
  181. +------+ +---------------------------------+
  182. | FVs1 | | Training Feature Vector Set 1 |
  183. +------+ +---------------------------------+
  184. \endverbatim
  185. Then, a feature vector set is built form the \c FVs1, \c FVs3, \c FVs4, \c FVs5. The classifier algorithm
  186. is trained on this feature vector set. Then the classifier is tested on the \c FVs2 :
  187. \verbatim
  188. +-------+ +------+ +------------------------+
  189. | Train | | FVs2 | | ing Feat. Vector Set 2 |
  190. +-------+ +------+ +------------------------+
  191. \endverbatim
  192. The same process if performed on all the partitions :
  193. \verbatim
  194. +---------------+ +------+ +---------------+
  195. |Training Featur| | FVs3 | |e Vector Set 3 |
  196. +---------------+ +------+ +---------------+
  197. +------------------------+ +------+ +------+
  198. |Training Feature Vector | | FVs4 | |Set 4 |
  199. +------------------------+ +------+ +------+
  200. +---------------------------------+ +------+
  201. | Training Feature Vector Set 5 | | FVs5 |
  202. +---------------------------------+ +------+
  203. \endverbatim
  204. Important things to consider :
  205. - The more partitions you have, the more feature vectors you have in your training sets... and the less examples
  206. you'll have to test on. This means that the result of the test will probably be less reliable.
  207. In conclusion, be careful when choosing this k-fold test setting. Typical value range from 4 partitions (train on 75% of the feature vectors and
  208. test on 25% - 4 times) to 10 partitions (train on 90% of the feature vectors and test on 10% - 10 times).
  209. Note that the cross-validation performed by the classifier trainer box in OpenViBE may be optimistic.
  210. The cross-validation computation is working as it should, but it cannot take into account what happens outside
  211. the classifier trainer box. In OpenViBE scenarios, there may be e.g. time overlap from epoching, feature
  212. vectors drawn from the same epoch ending up in the same cross-validation partition, and (supervised)
  213. preprocessing such as CSP or xDAWN potentially overfitting the data before its given to the classifier trainer.
  214. Such situations are not compatible with the theoretical assumption that the feature vectors are
  215. independent and identically distributed (the typical iid assumption in machine learning) across
  216. train and test. To do cross-validation controlling for such issues, we have provided
  217. a more advanced cross-validation tutorial as part of the OpenViBE web documentation.
  218. \par Confusion Matrices
  219. At the end of the training, the box will print one or two confusion matrices, depending if cross-validation
  220. was used: one matrix for the cross-validation, the other for the training data. Each matrix will contain true
  221. class as rows, and predicted class as columns. The diagonal describes the percentage of correct predictions per class.
  222. Although the matrix can be optimistic (see above section about the cross-validation), it may give useful
  223. diagnostic information. For example, if the accuracy is very skewed towards one class, this may indicate
  224. a problem if the design is supposed to be balanced. The problem may originate e.g. from the original data
  225. source, the signal processing chains for the different classes, or the classifier learning algorithm. These need
  226. then to be investigated. Also, if very low accuracies are observed in these matrices, it may give reason
  227. to suspect that prediction accuracies on fresh data might be likewise lacking -- or worse.
  228. \par Incremental Learning
  229. The box can also be used for simple incremental (online) learning. To achieve this, simply send the box the training
  230. stimulation and it will train a classifier with all the data it has received so far. You can give it more
  231. feature vectors later, and trigger the learning again by sending another stimulation. Likewise, the corresponding
  232. classifier processor box can be made to load new classifiers during playback. With classifiers like LDA,
  233. this practice is usually feasible when the data is reasonably sized (as in basic motor imagery).
  234. * |OVP_DocEnd_BoxAlgorithm_ClassifierTrainer_Miscellaneous|
  235. */