Methods
The bioinformatics analysis was conducted with version 4.1.2. of the R programming language (R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.). The implemented packages and their corresponding versions can be consulted in the Supplementary Table S2.
Data collection.
Literature screening was conducted during December 2022. Regarding the disease addressed, all selected studies met the following inclusion criteria: i) type of data: transcriptomic data, ii) organism: human (Homo sapiens), iii) type of biological sample processing: frozen postmortem tissue sections. Additional specifications differed for each type of disease. MBM data corresponded to differential gene expression results from MBM versus melanoma non-brain metastasis. To broaden the analysis, MBM versus non tumor-bearing brain controls results were also collected. For neurodegenerative diseases, the systematic review has already been published in (MS in PMID: 37023829, and PD in PMID: 36414996).
Differential gene expression and meta-analyses in neurodegenerative diseases studies.
Novel differential gene expression analysis was performed independently for each selected study of each neurodegenerative disease. Case versus control comparison was tested by implementing the linear regression model of the R limma package (PMID: 25605792). When necessary, batch effect was included as covariable. P-values were adjusted by the Benjamini and Hochberg (BH) procedure (PMID: 11682119), considering a significant change in gene expression when FDR (False Discovery Rate) < 0.05. Log2 Fold Change (logFC) was calculated to define the direction and magnitude of change. Thus, logFC positive genes are upregulated in cases (downregulated in controls), and logFC negative genes are upregulated in controls (downregulated in cases).
Then results were integrated into five meta-analyses based on the neurodegenerative disease and the brain region examined (AD-CT, AD-HP, PD-SN, PD-ST and MS). Meta-analyses were developed as previously described (García-García F. Methods of functional enrichment analysis in genomic studies [PhD Thesis]. 2016). To account for the individual study heterogeneity, we implemented DerSimonian and Laird random-effects model (PMID: 3802833) using metafor R package (https://doi.org/10.18637/jss.v036.i03). Meta-analyses statistics were calculated, and p-values were corrected by BH method (PMID: 11682119). FDR’s significant cutoff was established as 0.05.
Intersection analysis
Significant genes for MBM differential gene expression results and neurodegenerative diseases meta-analyses were selected by setting an FDR of 0.05, and classified based on the direction of change in the corresponding statistical comparison: upregulated genes (logFC > 0) and downregulated genes (logFC < 0). We then looked for common significant genes between MBM and each neurodegenerative disease by performing intersection analyses of the resulting gene lists. Due to the availability of two studies with the MBM vs melanoma non-brain metastasis comparison, we filtered out genes with the same pattern in MBM-1 and MBM-2 studies. To visualize the results, we elaborate bar plots with the R package ggplot2 (Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org), upset plots with the R package UpSetR (PMID: 28645171), and heatmaps with the R package ComplexHeatmap (https://doi.org/10.1002/imt2.43).
Resampling
We performed a resampling analysis to detect whether the genes resulting from the intersection analysis had non-arbitrary biological signals or had arisen randomly. For each pair of MBM and neurodegenerative disease results we first selected the common genes assessed in both diseases. Then we performed 10000 iterations of the following process: (1) arbitrarily select the number of genes identified in the intersection analysis and (2) identify how many of these genes are significant in the two MBM studies. Finally, we calculated the median of the calculated simulations.
Functional signatures of the common transcriptomic features
The unveiled gene profiles were fully functionally characterized by two strategies: protein-protein interaction analysis (PPI) and overrepresentation analysis (ORA). PPI analyses were conducted using the STRING R package (PMID: 30476243). The interaction networks were generated using default parameters, where we evaluated both functional and physical protein associations. Significant networks were considered when PPI enrichment p-value < 0.05. For the purpose of visualization, disconnected proteins were hidden, and their interaction confidence value represented the strength of the interactions. We also screened the biological relationships in the selected gene sets. Biological annotations were obtained using the AnnotationDbi R package (Pagès H, Carlson M, Falcon S, Li N (2023). AnnotationDbi: Manipulation of SQLite-based annotations in Bioconductor. R package version 1.62.1) from the org.Hs.eg.db R package (Carlson M (2019). org.Hs.eg.db: Genome wide annotation for Human. R package version 3.8.2.): i) Reactome pathways, ii) KEGG pathways, iii) GO Biological Processes, iv) GO Molecular Functions, and v) GO Cellular Components; being the last three from the Gene Ontology database. To perform the functional tests, the statistical method ORA was implemented with the clusterProfiler R package (PMID: 34557778). We analyzed those gene sets within the range of 10 (minimal size) and 500 (maximal size) genes, thus filtering out specific and general results. We calculated GeneRatio, BgRatio, q-value and p-value statistics. P-values were adjusted by BH method (PMID: 11682119), considering statistical significance when FDR < 0.05. Plots were generated with the corresponding functions of the enrichplot R package (Yu G (2023). enrichplot: Visualization of Functional Enrichment Result. R package version 1.20.0).
Web tool
The NAME web tool is an open resource to explore in-depth the data and results presented in this manuscript. It has been developed with Quarto system (https://github.com/quarto-dev/quarto-cli), providing an user-friendly and interactive environment to navigate through the web. Plots have been generated with ggplot2 (Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org), plotly (Sievert C (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC. ISBN 9781138331457, https://plotly-r.com) and enrichplot (Yu G (2023). enrichplot: Visualization of Functional Enrichment Result. R package version 1.20.0) R packages. Results are displayed in seven sections: i) overview of the implemented pipeline, ii) MBM expression profiles, iii) neurodegenerative diseases expression profiles, iv) neurodegenerative signature and functional profiling of brain metastasis-specific signatures (MBM-1-2 results), v) neurodegenerative signature and functional profiling of MBM tumor profile (MBM-3 results), vi) methods description, and vii) code availability.