Clustering analysis for expression data in arthritis

The etiology of rheumatoid arthritis is not known with certainty. In order to generate information that clarifies this point, a study of expression microarrays has been proposed, which will allow characterizing this disease at the molecular level and finding some key mechanisms that will improve its prevention and treatment.


Goal

Detect homogenous groups of subjects according to their transcriptomic profile and evaluate the possible presence of anomalous patterns.


Data

We have normalized data from Affymetrix microarrays for three experimental groups:

  • 5 patients with rheumatoid arthritis (RA1-RA5).
  • 4 patients with osteoarthritis (OA1-OA4).
  • 6 healthy people (H1-H6).

Work plan

  1. Open the data file of gene expression with a spreadsheet and inspect its contents. There will be as many columns as subjects and as many rows as genes.
  2. Upload this txt file in Babelomics from the “Upload” menu. We will have to indicate the type of data that we upload: “Data matrix expression”. This link describes the different types of data that we can use in Babelomics: https://github.com/babelomics/babelomics/wiki/Data-types.
  3. Next, we select the clustering by samples. We chose the “SOTA” clustering method and the distance “Pearson correlation coefficient”. We assign a name to the job and execute it.
  4. Perform a clustering for genes (to begin with, those that are by default). We assign a name to the job and execute it.

Questions

  1. Are there groups of samples with a similar transcriptomic profile? How many groups appear?
  2. Is there any sample that has an anomalous behavior when comparing with other subjects? Any proposal?
  3. Do you think that if we performed a differential expression analysis we would obtain a large number of differentially expressed genes?
  4. Any incidence with clustering by genes?