Table of Contents
Supervised classification for RNA-Seq data of Lung squamous cell carcinoma
Data description
RNA-Seq data of Lung squamous cell carcinoma (LUSC) samples taken from The Cancer Genome Atlas (TCGA) data portal.
Goals
- We want to train several classification models in Babelomics.
- After this step, we are evaluating the best way of classifying our data from a test dataset.
Work plan
- Download tca_gene_lusc_train.txt. Contains 11 Normal and 150 Tumor samples.
- Download tca_gene_lusc_test.txt. Contains 6 Normal and 75 Tumor samples.
- Upload your files to Babelomics 5.0. Go to section Expression > Class Prediction
- Try several classification strategies:
- Select SVM, KNN and Random Forest
- Select Leave-one-out for error estimation
- Select Correlation-based Feature Selection (CFS)
- Download test_result.txt
- Which supervised classification method(s) works better?
- How many genes were used for the prediction?
- Are the selected genes same for all methods?