Methods
Bioinformatics and statistical analysis were run in R software v.3.5.0 [1].
1. Selection of gene expression data sets
This systematic review of the GEO and ArrayExpress databases was performed with the keywords “high fat diet”, “nafld” and “nash” [9, 10]. The period reviewed ranged from January 2002 (generation of the public repositories) to May 2020. Results were filtered by:
- organism: "Mus musculus"
- study type: "expression profiling by array" or "expression profiling by high throughput sequencing"
- tissue: “liver”
- information about inclusion criteria not available
- diet with less than 35% kcal of fat
- any genetic alteration (ob / ob, Ldlr - / -, etc.)
- diets deficient in choline and/or methionine
- oophorectomy
- presence of other treatments that enhance liver damage, such as carbon tetrachloride or streptozotocin
- strain different from C57BL/6J
- less than 3 replicates in any diet
- diet duration of less than 8 weeks
- transcriptomic studies in which only non-coding RNA was analysed.
2. Data acquisition and preprocessing
The studies included in the systematic review were downloaded and standardized. Gene names were
converted to Gene Symbol nomenclature and the median expression was calculated when there were
multiple values for the same gene. Individuals in each study were tagged as HFD, ND or HFD plus
treatment (HFDt). After data normalization, an exploratory analysis was performed to detect
possible batch effects and anomalous behavior among the data, using clustering and principal
component analysis (PCA).
3. Differential Gene Expression (DGE)
After normalizing the data, different DGE analyses were performed for each individual study to
detect differences in gene expression levels between groups. A comparison of HFD vs. ND was made
to obtain a transcriptomic signature of the pathology, while a comparison of HFDt vs. HFD was
performed to detect the anti-steatotic effect of the treatments. HFDt vs ND were compared to
assess the degree of reversion of the detected alterations. When different treatments were present
in the same study, we performed a specific DGE for each of them. The limma R package [12] was used
to calculate the differential expression statistics. P-values were adjusted by the Benjamini &
Hochberg (BH) method [13].
.
4. Meta-analysis of DGE
For the HFD vs. ND comparison, meta-analysis techniques were used to obtain a robust transcriptomic HFD signature from the differential expression results of each individual study. The DGE results of the HFDt vs. HFD comparison were meta-analyzed to detect the common signal of the effect of the anti-steatotic treatments. The meta-analyses were performed following the methodology described by García-García et al [14]. In brief, the metafor R package [15] was used to combine the expression results of the corresponding individual studies, using a random-effects model [16]. Thus, the meta-analysis calculated a pooled measure of the expression effect for each gene across all the studies; to do this we used the logarithm of the fold change (logFC), for which the less variable results were given greater weight, and tested its significance. We then adjusted the p-value by the BH method, and established 0.05 as the significance cutoff. To explore the impact of the anti-steatotic treatments on the transcriptomic HFD signature, we performed a correlation analysis at gene level, using Pearson's correlation coefficient and a 95% confidence interval.
5. Functional enrichment
To elucidate the molecular basis underlying the genetic differences found in the DGE, a functional enrichment of each individual study was performed using gene set analysis (GSA). The ontology used was Gene Ontology (GO) biological processes (BP) [17]. GSA was carried out using the logistic regression model implemented in the mdgsa R package [18]. The logarithm of the odds ratio (LOR) is the measure of effect in functional enrichment analysis and is obtained directly from logistic regression. Due to their hierarchical structure, the gene annotations with GO terms applied in the mdgsa package were propagated to inherit the annotations of the ancestor terms. Excessively specific or generic annotations (blocks smaller than ten or larger than 500 genes, respectively) were subsequently filtered out. Finally, functions with a Benjamini & Yekutieli (BY) adjusted p-value [19] under 0.05 were considered significant.
6. Meta-analysis of Functional Enrichment
The GSA results of the individual studies were meta-analyzed following the same procedure described in "Meta-analysis of DGE". We performed one meta-analysis for each of the three above-mentioned comparisons, in this case obtaining the LOR as a measure of the magnitude of change.
7. Summary and representation of altered GO terms in meta-analyses
To summarize the significantly altered GO terms in the three meta-analyses, we took advantage of the DAG (directed acyclic graph) structure of the GO database, in which specific lower-level terms are linked to general higher-level terms. We selected a group of representative high-level GO terms with low overlapping rates, which we called categories, and classified each significant GO term in a category if it was an ancestor of the term. The selected categories are:
- Cell death
- Cell population proliferation
- Cellular component organization or biogenesis
- Cytokine production
- Homeostatic process
- Immune system process
- Ion transport
- Lipid metabolic process
- Localization
- Metabolic process
- Multicellular organismal process
- Nitrogen compound metabolic process
- Regulation of metabolic process
- Response to stress
8. Cell culture
Hepatocyte-like HepaRG cells were used as a hepatocyte model. Undifferentiated HepaRG cells (HPRGC10) were purchased from Gibco (Thermo Fisher Scientific, Waltham, MA, USA), and cultured and differentiated as previously described [20]. To generate fatty acid-overloaded hepatocytes (FA-overloaded), cells were incubated in medium supplemented with 0.5 mM sodium oleate and 0.25 mM sodium palmitate (both from Sigma-Aldrich, Stenheim, Germany), conjugated with BSA (Roche Life Science, Penzberg, Germany) (ratio 6:1), and 0.05 mM of L-carnitine (Sigma-Aldrich) for 48 h. LX2 cells, a human immortalized line of hepatic stellate cells (HSC), were purchased from Sigma-Aldrich and cultured in Dulbecco's Modified Eagle Medium with high glucose concentration (Gibco, Thermo Fisher Scientific), supplemented with 10% heat-inactivated fetal bovine serum (iFBS, Sigma-Aldrich) and 50 U/ml streptomycin and 50 U/ml penicillin (both from Gibco, Thermo Fisher Scientific). These cells were treated for 48 h with the profibrogenic stimulus transforming growth factor (TGFβ1, 2.5 ng/ml, Miltenyi Biotec) or its vehicle (DMSO, Sigma Aldrich).
9. RNA extraction and real-time RT-PCR
Total mRNA isolation from cell cultures and real-time RT-PCR were performed as described previously [21]. Primer sequences of human genes were synthesized by IDT® (Integrated DNA Technologies, Coralville, IA, USA) or Metabion (Planegg, Germany) and are shown in Supplementary Table 1. β-actin was analyzed as a house-keeping gene.
10. Data availability statement
All data sets analyzed in the current study are available from Gene Expression Omnibus. All data generated during this study are provided as Supplementary Material and/or are available in the webtool resource and the Zenodo open data repository.