Supervised classification for RNA-Seq data of Lung squamous cell carcinoma


Data description

RNA-Seq data of Lung squamous cell carcinoma (LUSC) samples taken from The Cancer Genome Atlas (TCGA) data portal.

Goals

  1. We want to train several classification models in Babelomics.
  2. After this step, we are evaluating the best way of classifying our data from a test dataset.

Work plan

  1. Download tca_gene_lusc_train.txt. Contains 11 Normal and 150 Tumor samples.
  2. Download tca_gene_lusc_test.txt. Contains 6 Normal and 75 Tumor samples.
  3. Upload your files to Babelomics 5.0. Go to section Expression > Class Prediction
  4. Try several classification strategies:
    • Select SVM, KNN and Random Forest
    • Select Leave-one-out for error estimation
    • Select Correlation-based Feature Selection (CFS)
  5. Download test_result.txt
    • Which supervised classification method(s) works better?
    • How many genes were used for the prediction?
    • Are the selected genes same for all methods?