In this example we are going to analyse a dataset from Golub et al. (1999). In that paper they were studying two different types of leukemia (acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) in order to detect differences between them. This dataset have 3051 genes and 38 arrays, 27 of them labeled as ALL and 11 of them as AML.
Using Class prediction we are going to build a predictor to try to distinguish between both classes. In the train file we can see 30 arrays, 21 ALL and 9 AML. The rest, 6 ALL and 2 AML, are in the test file for predicting.
You can find the dataset for this exercise in the following files:
ALL ALL ALL ALL ALL ALL AML AML