RNA-Seq data from Breast Invasive Carcinoma (BRCA)
1. Open this file and explore the content: tcga_rnaseq.txt. Description:
- RNA-Seq data of BRCA samples taken from The Cancer Genome Atlas (TCGA) data portal. (http://cancergenome.nih.gov/)
- Contains 10 normal samples, 20 tumor samples with 2 subtypes (Basal-like and Her2-enriched).
- This dataset was normalized from TMM.
2. Upload your file to Babelomics 5.0.
3. Go to section Expression > Clustering and try several clustering strategies for samples & genes:
- UPGMA + Euclidean (square)
- UPGMA + Correlation coeff. (Spearman)
- Which distance parameter is better for proper clustering?
4. Repeat the analysis using the same distance parameters and SOTA method.
- SOTA + Euclidean (square)
- SOTA + Correlation coeff. (Spearman)
- Do the results change based on the method or the distance parameter?
5. Try to cluster your samples with K-means.
- Set k-value 6 and use Correlation coeff. (Spearman)
- Repeat the same analysis with k-value 3.
- Check the results of K-means.
- Are the results acceptable?
- Is the dendrogram representing any hierarchy between the samples?
6. Try to cluster your samples with K-means.
- Set k-value 2 and use Correlation coeff. (Spearman).
- Can we say that K-means is good to distinguish tumor from normal?