Supervised classification or predictors


Predictors are used to assign a new data (expression, proteins, metabolites…) to a specific class (e.g. diseased case or healthy control) based on a rule constructed with a previous dataset containing the classes among which we aim to discriminate. This dataset is usually known as the training set. The rationale under this strategy is the following: if the differences between the classes (our macroscopic observations, e.g. cancer versus healthy cases) is a consequence of certain differences an gene level, and these differences can be measured as differences in the level of gene expression, then it is (in theory) possible finding these gene expression differences and use them to assign the class membership for a new array. This is not always easy, but can be aimed. There are different mathematical methods and operative strategies that can be used for this purpose.

In Babelomics, there is an unsupervised classification module to help in the process of building a “good predictor”. In this resource:

  • We have implemented several widely accepted strategies so as this tool can build up simple, yet powerful predictors, along with a carefully designed cross-validation of the whole process (in order to avoid the widespread problem of “selection bias”).
  • Babelomics allows combining several classification algorithms with different methods for gene selection.
  • Main indicators to assess the quality of prediction: accuracy, MCC, AUC and RMSE.

Activities

We have prepared two activities to know how is possible the generation of predictors from Babelomics.

Here you have more detailed information about supervised classification module in Babelomics.


Results: pdf, docx