Methods
This web site summarizes the bioinformatics and statistical methods used. A more extended version of the complete methods can be found in the article.
After LC-MS/MS analysis, annotation and identification of the lipids, the analyses described in Figure 1 and detailed below were performed. The software used was R v3.6.3 [1].
Bioinformatic Analyses
1. Data Preprocessing
Data preprocessing included filter entities, normalization of abundance lipid matrix, and exploratory analyses. Mass Hunter Qualitative results (.cef file) were imported into Mass Profiler Professional (MPP) (Agilent Technologies) for statistical analysis, where separate experiments were created for positive and negative ion modes. Entities were filtered based on their frequency, selecting those consistently present in all replicates of at least 1 treatment. A percentile shift normalization algorithm (75%) was used, and datasets were baselined to the median of all samples. The median of their abundance values was calculated when dealing with duplicated lipids with different retention times. Data normalization was followed by exploratory analysis using cluster analysis, principal component analysis (PCA), and box and whisker plots by samples and lipids to detect abundance patterns between samples and lipids and batch effects anomalous behavior in the data. At this point, anomaly-behaving samples and outliers (values that lie over 1.5 x interquartile range (IQRs) below the first quartile (Q1) or above the third quartile (Q3) in the data set) were excluded for presenting a robust batch effect with a critical impact on differential abundance analysis.
2. Differential Lipid Abundance
Lipid abundance levels between groups were compared using the limma R package [2]. P-values were adjusted using the Benjamini & Hochberg (BH) procedure [3], and significant lipids were considered when the BH-adjusted p-value ≤ 0.05.
3. Class Enrichment Analysis
Class annotation was conducted using the RefMet database [4] and compared with the LIPID MAPS database [5]. The classification is hierarchical. As an initial step in this division, lipids were divided into several principal categories ("super classes") containing distinct main classes and sub classes of molecules, devising a standard manner of representing the chemical structures of individual lipids and their derivatives. Description of abbreviations is detailed in supplementary files. Annotation was followed by ordering lipids according to the p-value and sign of the statistic obtained in the differential lipid abundance. Similar to a Gene Set Enrichment Analysis (GSEA) method, a class enrichment analysis was carried out using Lipid Set Enrichment Analysis (LSEA) implemented in the mdgsa R package [6]. The p-values were corrected for BH, and classes with a BH-adjusted p-value ≤ 0.05 were considered significant.
4. Comparisons
3 comparisons were performed for each group (human, WT mice, TLR4-KO mice) to analyze differential lipid abundance:
- The ethanol effects in females (EEF), which compares ethanol-intoxicated females and control females.
- The ethanol effects in males (EEM), which compares ethanol-intoxicated males and control males
- Sex-ethanol interaction (SEI), which compares EEF and EEM.
Class enrichment analysis was assessed using the same three comparisons in human samples.
The statistics used to measure the differential patterns were the logarithm of fold change (LFC) to quantify the effect of differential lipid abundance analysis and the logarithm of odds ratio (LOR) to measure the enrichment of each functional class. A positive statistical sign indicates a higher mean for the variable in the first element of the comparison, whereas a negative statistical sign indicates a higher mean value for the second element. The SEI comparisons focus on finding differences between female and male comparisons. Thus, a positive statistic may indicate either upregulation in females and downregulation in males or a higher increase o r a lower decrease of the variable in intoxicated female subjects. On the other hand, a negative statistic may indicate either upregulation in males and downregulation in females or a higher increase or a lower decrease of the variable in intoxicated male subjects. In this comparison, the behavior of each lipid across the groups must be assessed a posteriori, examining female and male comparisons.
In addition, a correlation analysis was conducted between the differential abundance results in the different comparisons. Pearson's correlation coefficient measured the relationship between these differential profiles, providing an overall picture, while the intersection of the significant lipids between comparisons provides us with a specific view of the results of the comparisons. Both approaches complementarily improve the understanding of the results of the different contrasts evaluated.
References
- R Core Team R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2019.
- Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47.
- Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Ser B Methodol. 1995;57: 289–300.
- Fahy E, Subramaniam S. RefMet: a reference nomenclature for metabolomics. Nat Methods. 2020;17: 1173–1174.
- Sud M, Fahy E, Cotter D, Brown A, Dennis EA, Glass CK, et al. LMSD: LIPID MAPS structure database. Nucleic Acids Res. 2007;35: D527–D532.
- Montaner D, Dopazo J. Multidimensional Gene Set Analysis of Genomic Data. Hoheisel J, editor. PLoS ONE. 2010;5: e10348.