You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

Doc_BoxAlgorithm_ClassifierTrainer.rst 18KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321
  1. .. _Doc_BoxAlgorithm_ClassifierTrainer:
  2. Classifier trainer
  3. ==================
  4. .. container:: attribution
  5. :Author:
  6. Yann Renard, Guillaume Serriere
  7. :Company:
  8. INRIA/IRISA
  9. .. image:: images/Doc_BoxAlgorithm_ClassifierTrainer.png
  10. Performs classifier training with cross-validation -based error estimation
  11. The *Classifier Trainer* box is a generic box for training models to classify input data.
  12. It works in conjunction with the :ref:`Doc_BoxAlgorithm_ClassifierProcessor` box.
  13. This box' role is to expose a generic interface to the rest of the BCI pipelines. The box
  14. will generate an internal structure according to the multiclass strategy and the learning
  15. algorithm selected.
  16. The behavior is simple, the box collects a number of feature vectors. Those feature vectors
  17. are labelled depending on the input they arrive on. When a specific stimulation arrives, a training
  18. process is triggered. This process can take some time so this box should be used offline. Depending on the
  19. settings you enter, you will be able to perform a k-fold test to estimate the accuracy of the learned
  20. classifier. When this training stimulation is received, the box generates a configuration file that will
  21. be usable online by the :ref:`Doc_BoxAlgorithm_ClassifierProcessor` box.
  22. Finally, the box outputs a particular stimulation (OVTK_StimulationId_TrainCompleted)
  23. on its output, that can be used to trigger further treatments in the scenario.
  24. Inputs
  25. ------
  26. .. csv-table::
  27. :header: "Input Name", "Stream Type"
  28. "Stimulations", "Stimulations"
  29. "Features for class 1", "Feature vector"
  30. "Features for class 2", "Feature vector"
  31. This box can have a variable number of inputs. If you need more than two classes, feel free to add more
  32. inputs and to use a proper strategy/classifier combination to handle more than two classes.
  33. Stimulations
  34. ~~~~~~~~~~~~
  35. The first input receives a stimulation stream. Only one stimulation of this stream is important, the one
  36. that triggers the training process. When this stimulation is received, all the feature vectors are labelled
  37. and sent to the classification algorithm. The training is triggered and executed. Then the classification
  38. algorithm generates a configuration file that will be used online by the :ref:`Doc_BoxAlgorithm_ClassifierProcessor` box.
  39. Features for class 1
  40. ~~~~~~~~~~~~~~~~~~~~
  41. This input receives the feature vector for the first class.
  42. Features for class 2
  43. ~~~~~~~~~~~~~~~~~~~~
  44. This input receives the feature vector for the second class.
  45. Outputs
  46. -------
  47. .. csv-table::
  48. :header: "Output Name", "Stream Type"
  49. "Train-completed Flag", "Stimulations"
  50. Train-completed Flag
  51. ~~~~~~~~~~~~~~~~~~~~
  52. The stimulation OVTK_StimulationId_TrainCompleted is raised on this output when the classifier trainer has finished its job.
  53. .. _Doc_BoxAlgorithm_ClassifierTrainer_Settings:
  54. Settings
  55. --------
  56. .. csv-table::
  57. :header: "Setting Name", "Type", "Default Value"
  58. "Train trigger", "Stimulation", "OVTK_StimulationId_Train"
  59. "Filename to save configuration to", "Filename", "${Path_UserData}/my-classifier.xml"
  60. "Multiclass strategy to apply", "Classification strategy", "Native"
  61. "Class 1 label", "Stimulation", "OVTK_StimulationId_Label_01"
  62. "Class 2 label", "Stimulation", "OVTK_StimulationId_Label_02"
  63. "Algorithm to use", "Classification algorithm", "Linear Discrimimant Analysis (LDA)"
  64. "Use shrinkage", "Boolean", "false"
  65. "Shrinkage coefficient (-1 == auto)", "Float", "-1.000000"
  66. "Shrinkage: Force diagonal cov (DDA)", "Boolean", "false"
  67. "Number of partitions for k-fold cross-validation test", "Integer", "10"
  68. "Balance classes", "Boolean", "false"
  69. The number of settings of this box can vary depending on the classification algorithm you choose. Such algorithm
  70. could have specific input OpenViBE::Kernel::IParameter objects (see OpenViBE::Kernel::IAlgorithmProxy for details). If
  71. the type of those parameters is simple enough to be handled in the GUI, then additional settings will be added to this box.
  72. **After switching a strategy or a classifier, you will have to close and re-open the settings configuration dialog to see the parameters of the new classifier.** Supported parameter types are : Integers, Floats, Enumerations, Booleans. The documentation for those
  73. parameters can not be done in this page because it is impossible to know at this time what classifier thus what hyper
  74. parameters you will have available. This will depend on the classification algorihtms that are be implemented in OpenViBE.
  75. Train trigger
  76. ~~~~~~~~~~~~~
  77. The stimulation that triggers the training process and save the learned classifier to disk.
  78. Filename to save configuration to
  79. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  80. This setting points to the configuration file where to save the result of the training for later online use. This
  81. configuration file is used by the :ref:`Doc_BoxAlgorithm_ClassifierProcessor` box. Its syntax
  82. depends on the selected algorithm.
  83. Multiclass strategy to apply
  84. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  85. This setting is the strategy to use. You can choose any registered ``OVTK_TypeId_ClassificationStrategy``
  86. strategy you want.
  87. Class 1 label
  88. ~~~~~~~~~~~~~
  89. This is the stimulation to send when the classifier algorithm detects a class-1 feature vector
  90. Class 2 label
  91. ~~~~~~~~~~~~~
  92. This is the stimulation to send when the classifier algorithm detects a class-2 feature vector
  93. Algorithm to use
  94. ~~~~~~~~~~~~~~~~
  95. This setting is the classifier to use. You can choose any registered ``OVTK_TypeId_ClassifierAlgorithm``
  96. algorithm you want.
  97. Number of partitions for k-fold cross-validation test
  98. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  99. If you want to perform a k-fold test, you should enter something else than 0 or 1 here. A k-fold test generally gives
  100. a better estimate of the classifiers accuracy than naive testing with the training data. The classifier may overfit
  101. the training data, and get a good accuracy with the observed data, but not be able to generalize to unseen data.
  102. In cross-validation, the idea is to divide the set of feature vectors in a number of partitions. The classification algorithm
  103. is trained on some of the partitions and its accuracy is tested on the others. However, the classifier produced by the box is
  104. the classifier trained with the whole data. The cross-validation is only an error estimation tool, it does not affect
  105. the resulting model. See the miscellaneous section for details on how the k-fold test is done in this box, and possible
  106. caveats about the cross-validation procedure.
  107. Balance classes
  108. ~~~~~~~~~~~~~~~
  109. If the number of class labels is unbalanced, the classifiers tend to be biased towards the majority labels.
  110. This option can be used to resample the dataset to feature all classes equally.
  111. The algorithm first looks how many examples there are in the majority class. Lets say this is n. Then, if class k has m examples,
  112. it will random sample n-m examples with replacement from class k, appending them to the dataset. This will be done for each class.
  113. In the end, each class will have n examples and all except the majority class will have some duplicate training vectors.
  114. This can be seen as a technique to weight the importance of examples for such classifiers that do not support setting example weights
  115. or class weight prior, and can in general be attempted with arbitrary learning algorithms.
  116. Enabling this option may make sense if the box is used for incremental learning, where all classes may not be equally represented
  117. in the training data obtained so far, even if the design itself is balanced. Note that enabling this will make the cross-validation
  118. results optimistic. In most conditions, the feature should be disabled.
  119. .. _Doc_BoxAlgorithm_ClassifierTrainer_Examples:
  120. Examples
  121. --------
  122. This box is used in BCI pipelines in order to classify cerebral activity states. For a detailed scenario using this
  123. box and its associated :ref:`Doc_BoxAlgorithm_ClassifierProcessor`, please see the **motor imagary**
  124. BCI scenario in the sample scenarios. An even more simple tutorial with artificial data
  125. is available in the **box-tutorials/** folder.
  126. .. _Doc_BoxAlgorithm_ClassifierTrainer_Miscellaneous:
  127. Miscellaneous
  128. -------------
  129. The box supports various multiclass strategies and classifiers as plugins.
  130. \par Available strategy:
  131. Strategy refers to how feature vectors are routed to one or more classifiers, which possibly can handle only 2 classes themselves.
  132. \par Native
  133. Use the classifier training algorithm without a pairwise strategy. All the data is passed to a single classifier trainer.
  134. \par One Vs All
  135. Use a pairwise strategy which consists of training each class against all the others, creating n classifiers for n classes.
  136. \par One vs One
  137. Use a airwise strategy which trains one classifier for each pair of classes. Then we use a decision startegy to extract the most likely class. There are three differents decision strategy:
  138. \li Voting: method based on a simple majority voting process
  139. \li HT: method described in: Hastie, Trevor ; Tibshirani, Robert. Classification by pairwise coupling. The Annals of Statistics 26 (1998), no. 2, 451--471
  140. \li PKPD: method describe in: Price, S. Knerr, L. Personnaz, and G. Dreyfus. Pairwise neural network classifiers with probabilistic outputs. In G. Tesauro, D. Touretzky, and T. Leen (eds.)
  141. Advances in Neural Information Processing Systems 7 (NIPS-94), pp. 1109-1116. MIT Press, 1995.
  142. You cannot use every algorithm with every decision strategy, but the interface will restain the choice according to your selection.
  143. \par Available classifiers:
  144. \par Support Vector Machine (SVM)
  145. A well-known classifier supporting non-linear classification via kernels. The implementation is based on LIBSVM 2.91, which is included in the OpenViBE source tree. The parameters exposed in the GUI correspond to LIBSVM parameters. For more information on LIBSVM, see <a href="http://www.csie.ntu.edu.tw/~cjlin/libsvm/">here</a>.
  146. \par
  147. This algorithm provides only probabilities.
  148. \par Linear Discriminant Analysis (LDA)
  149. A simple and fast linear classifier. For description, see any major textbook on Machine Learning or Statistics (e.g. Duda, Hart & Stork, or Hastie, Tibshirani & Friedman). This algorithm can be used with a regularized covariance matrix
  150. according to a method proposed by Ledoit & Wolf: "A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices", 2004.
  151. The Linear Discriminant Analysis has the following options.
  152. \par
  153. \li Use shrinkage: Use a classic or a regularized covariance matrix.
  154. \li Shrinkage: A value s between [0,1] sets a linear weight between dataCov and priorCov. I.e. cov=(1-s)\*dataCov+s\*priorCov.
  155. Value <0 is used to auto-estimate the shrinking coefficient (default). If var(x) is a vector of empirical variances of all data dimensions, priorCov is a
  156. diagonal matrix with a single value mean(var(x)) pasted on its diagonal. Used only if use shrinkage is checked.
  157. \li Force diagonal cov (DDA): This sets the nondiagonal entries of the covariance matrices to zero. Used only if Use shrinkage is checked.
  158. \par
  159. Note that setting shrinkage to 0 should get you the regular LDA behavior. If you additionally force the covariance to be diagonal, you should get a model resembling the Naive Bayes classifier.
  160. \par
  161. This algorithm provides both hyperplane distance and probabilities.
  162. \par Multilayer Perceptron (MLP)
  163. A classifier algorithm which relies on an artificial neural network (<a href="https://hal.inria.fr/inria-00099922/en">Laurent Bougrain. Practical introduction to artificial neural networks. IFAC symposium on automation in Mining, Mineral and Metal Processing -
  164. MMM'04, Sep 2004, Nancy, France, 6 p, 2004.</a>). In OpenViBE, the MLP is a 2-layer neural network. The hyperbolic tangent is the activation function of the
  165. neurons inside the hidden layer. The network is trained using the backpropagation of the gradient. During the training, 80% of the training set is used to compute the gradient,
  166. and 20% is used to validate the new model. The different weights and biases are updated only once per iteration (just before the validation). A coefficient alpha (learning coefficient) is used to moderate the importance of
  167. the modification of weights and biases to avoid oscillations. The learning stops when the difference of the error per element (computed during validation) of two consecutive iterations is under the value epsilon given as a parameter.
  168. \par
  169. \li Number of neurons in hidden layer: number of neurons that will be used in the hidden layer.
  170. \li Learning stop condition : the epsilon value used to stop the learning
  171. \li Learning coefficient: a coefficient which influence the speed of learning. The smaller the coefficient is, the longer the learning will take, the more chance you will have to get a good solution.
  172. \par
  173. Note that feature vectors are normalized between -1 and 1 (using the min/max of the training set) to avoid saturation of the hyperbolic tangent.
  174. \par
  175. This algorithm provides both hyperplane distance (identity of output layer) and probabilites (softmax function on output layer).
  176. \par Cross Validation
  177. In this section, we will detail how the k-fold test is implemented in this box. For the k-fold test to be performed, you
  178. have to choose more than 1 partition in the related settings. Suppose you chose ``n`` partitions. Then when trigger stimulation
  179. is received, the feature vector set is splitted in ``n`` consecutive segments. The classification algorithm is trained on
  180. ``n-1`` of those segments and tested on the last one. This is performed for each segment.
  181. For example, suppose you have 5 partitions of feature vectors (``FVs)``
  182. .. code::
  183. +------+ +------+ +------+ +------+ +------+
  184. | FVs1 | | FVs2 | | FVs3 | | FVs4 | | FVs5 |
  185. +------+ +------+ +------+ +------+ +------+
  186. For the first training, a feature vector set is built form the ``FVs2,`` ``FVs3,`` ``FVs4,`` ``FVs5.`` The classifier algorithm
  187. is trained on this feature vector set. Then the classifier is tested on the ``FVs1`` :
  188. .. code::
  189. +------+ +---------------------------------+
  190. | FVs1 | | Training Feature Vector Set 1 |
  191. +------+ +---------------------------------+
  192. Then, a feature vector set is built form the ``FVs1,`` ``FVs3,`` ``FVs4,`` ``FVs5.`` The classifier algorithm
  193. is trained on this feature vector set. Then the classifier is tested on the ``FVs2`` :
  194. .. code::
  195. +-------+ +------+ +------------------------+
  196. | Train | | FVs2 | | ing Feat. Vector Set 2 |
  197. +-------+ +------+ +------------------------+
  198. The same process if performed on all the partitions :
  199. .. code::
  200. +---------------+ +------+ +---------------+
  201. |Training Featur| | FVs3 | |e Vector Set 3 |
  202. +---------------+ +------+ +---------------+
  203. +------------------------+ +------+ +------+
  204. |Training Feature Vector | | FVs4 | |Set 4 |
  205. +------------------------+ +------+ +------+
  206. +---------------------------------+ +------+
  207. | Training Feature Vector Set 5 | | FVs5 |
  208. +---------------------------------+ +------+
  209. Important things to consider :
  210. - The more partitions you have, the more feature vectors you have in your training sets... and the less examples
  211. you'll have to test on. This means that the result of the test will probably be less reliable.
  212. In conclusion, be careful when choosing this k-fold test setting. Typical value range from 4 partitions (train on 75% of the feature vectors and
  213. test on 25% - 4 times) to 10 partitions (train on 90% of the feature vectors and test on 10% - 10 times).
  214. Note that the cross-validation performed by the classifier trainer box in OpenViBE may be optimistic.
  215. The cross-validation computation is working as it should, but it cannot take into account what happens outside
  216. the classifier trainer box. In OpenViBE scenarios, there may be e.g. time overlap from epoching, feature
  217. vectors drawn from the same epoch ending up in the same cross-validation partition, and (supervised)
  218. preprocessing such as CSP or xDAWN potentially overfitting the data before its given to the classifier trainer.
  219. Such situations are not compatible with the theoretical assumption that the feature vectors are
  220. independent and identically distributed (the typical iid assumption in machine learning) across
  221. train and test. To do cross-validation controlling for such issues, we have provided
  222. a more advanced cross-validation tutorial as part of the OpenViBE web documentation.
  223. \par Confusion Matrices
  224. At the end of the training, the box will print one or two confusion matrices, depending if cross-validation
  225. was used: one matrix for the cross-validation, the other for the training data. Each matrix will contain true
  226. class as rows, and predicted class as columns. The diagonal describes the percentage of correct predictions per class.
  227. Although the matrix can be optimistic (see above section about the cross-validation), it may give useful
  228. diagnostic information. For example, if the accuracy is very skewed towards one class, this may indicate
  229. a problem if the design is supposed to be balanced. The problem may originate e.g. from the original data
  230. source, the signal processing chains for the different classes, or the classifier learning algorithm. These need
  231. then to be investigated. Also, if very low accuracies are observed in these matrices, it may give reason
  232. to suspect that prediction accuracies on fresh data might be likewise lacking -- or worse.
  233. \par Incremental Learning
  234. The box can also be used for simple incremental (online) learning. To achieve this, simply send the box the training
  235. stimulation and it will train a classifier with all the data it has received so far. You can give it more
  236. feature vectors later, and trigger the learning again by sending another stimulation. Likewise, the corresponding
  237. classifier processor box can be made to load new classifiers during playback. With classifiers like LDA,
  238. this practice is usually feasible when the data is reasonably sized (as in basic motor imagery).