RNA-Seq data from Breast Invasive Carcinoma (BRCA)


1. Open this file and explore the content: tcga_rnaseq.txt. Description:

  • RNA-Seq data of BRCA samples taken from The Cancer Genome Atlas (TCGA) data portal. (http://cancergenome.nih.gov/)
  • Contains 10 normal samples, 20 tumor samples with 2 subtypes (Basal-like and Her2-enriched).
  • This dataset was normalized from TMM.

2. Upload your file to Babelomics 5.0.

3. Go to section Expression > Clustering and try several clustering strategies for samples & genes:

  • UPGMA + Euclidean (square)
  • UPGMA + Correlation coeff. (Spearman)
  • Which distance parameter is better for proper clustering?

4. Repeat the analysis using the same distance parameters and SOTA method.

  • SOTA + Euclidean (square)
  • SOTA + Correlation coeff. (Spearman)
  • Do the results change based on the method or the distance parameter?

5. Try to cluster your samples with K-means.

  • Set k-value 6 and use Correlation coeff. (Spearman)
  • Repeat the same analysis with k-value 3.
  • Check the results of K-means.
  • Are the results acceptable?
  • Is the dendrogram representing any hierarchy between the samples?

6. Try to cluster your samples with K-means.

  • Set k-value 2 and use Correlation coeff. (Spearman).
  • Can we say that K-means is good to distinguish tumor from normal?