Network enrichment analysis: SNOW

INTRODUCTION

SNOW stands for “Studying Networks in the Omic World”. SNOW extracts and evaluates the cooperative behavior of lists of proteins/genes in terms of protein-protein interactions. Thus, SNOW complements other Babelomics tools as FatiGO, introducing a new dimension in the functional profiling of high-throughput experiments results, this is, protein-protein interaction data.

Protein-Protein Interactions are a central point at almost every level of cell function. A great effort is being made to provide a high quality map of the complete set of protein-protein interactions in the cell: The Interactome. The introduction of this kind of data into functional genomics may give us important clues in the understanding of cell activity.

We used data from the main protein-protein interactions public databases (HPRD, IntAct, BIND, DIP and MINT) to generate two interactomes: a non-filtered interactome that takes all the protein-protein interactions from the databases and a filtered interactome, with the protein-protein interactions that are detected for at least two different methodologies. User may also submit their own interactions.

SNOW identifies hubs in the list of proteins/genes (nodes) and evaluates the global degree of connections, centrality and neighborhood aggregation of the list by comparing the distributions of nodes connections degree, betweenness centrality and clustering coefficient respectively against the complete distribution of these parameters into the interactome of reference. Besides this, SNOW extracts the minimum network that connects the proteins/genes in the list. A user-fixed number of external proteins to connect nodes in the list is allowed. The topology of this network is evaluated by comparing distributions of node, edge and graph parameters of this network against pre-calculated distributions of a set (10000) of random lists with same size range. By this, Snow extracts information about whether the network represented in the list have more hubs, is more connected or have a more regular connections distribution than a random network.

Snow also provides an interactive visualization of the network and a complete description of interactome and local network parameters of each protein/gene in the list as well as the external nodes introduced by the program. This information together with the functional annotation provided will guide the user to identify the important nodes within, or even outside, the list as well as evaluate the modular functionality of the list as an entity.

In the same terms, a two lists comparison is also implemented.

SNOW Tutorial

SNOW is a web-based tool that introduces protein-protein interaction data into the functional profiling of genome-scale experiments. It extracts from a list of pre-selected proteins or genes the minimal connected network (smallest network that connects all the elements of the list) that they conform in terms of physical interactions and then it evaluates its topological parameters comparing them versus same-size networks generated from random lists of genes/proteins.

SNOW has the possibility of performing the following analyses:

  • Find modules of genes/proteins with a structural component within lists of pre-selected proteins-genes and evaluate its topological parameters.
  • Evaluate the importance of a set of proteins/genes in the human interactome identifying hubs, central proteins and proteins in highly interconnected areas as well as evaluating the role of the list as a unit of action within the human interactome in terms of centrality, connectivity and clustering coefficient.
  • Load user's interaction data and find modules of proteins/genes defined by this data within pre-selected lists. Find the role of these lists in your interactomic data by evaluating its topological parameters.
  • Compare the role of two lists of proteins/genes in the human interactome.
  • Get the minimal connected network of two lists of proteins/genes and compare their topological parameters.

In this tutorial we will show the different usages of SNOW. Furthermore, there are several examples that includes the lists of genes to perform the analyses as well as the pre-calculated results.

The following sections are the recipes to follow according to the data you have and depending on what type of analysis you want to perform.

1. One list analysis

1.1. Scenario: You have a list of proteins or genes that you have selected for a particular reason, for instance they are the result of a differential expression analysis of a microarray experiment, they are the spots identiyied as differentiate two samples in a two-dimensional gel electrophoresis or they have an interesting pattern of expression along several samples in a trancriptomic analysis. From this list of proteins/genes you want to find out whether they have something in common in terms of functionality. The lits have been selected under some particular reason that a priori may suggest that they migth be functionaly related. There are several programs/tools that try to extract those functionalities that are behind the set of genes/proteins, FatiGO or Marmite are two examples (find them within Babelomics. SNOW is another program with this aim, its particularity is that it uses protein-protein interaction data and evaluates protein/gene modules with a structural component.

1.2. What can you get using SNOW?:SNOW can map the genes/proteins in your list into the human interactome and extract the important ones as well as evaluate the list as a unit and tell you whether it is enriched in hubs, central proteins or they are in highly interconnected areas. Furthermore, SNOW calculates the minimal connected network of the list (the smallest network that connects the elements in the list) and evaluate its topology comparing it versus same-size networks generated from random lists of proteins/genes.

1.3. Parameters to choose In the SNOW web from you can choose between performing analysis of one or two lists, choose one list tab and you will find the following options:

  1. Select interactome. Choose the interactome you want to use in the analysis. There are two possibilities: a non-curated interactome (all ppis) and a curated interactome (ppis detected by at least two methods). See SNOW help for more information about their generation. If you have your own interactomic data you can submit your own interactions in the Paste your own interactions box in tabulated or .sif (cytoscape compatible) format.
  2. Proteins in interatctions and list in same id (when your interactions is selected). Only if you submit your own interactions, please tell us whether you submit interactomic data and your list with same identifier type.
  3. Max number of external proteins introduced. This option (Max number of external proteins introduced) is to tell SNOW how it should generate the Minimal Connected Network (MCN). This network is generated calculating the shortest paths among all the pairs of proteins/genes in your list. Only some of the shortest path will be added to the MCN, the ones that join two elements in the list directly and the ones that join two elements by a determined number of nodes. Select this external number of nodes from 0 to 3.
  4. Nature of your list (Proteins/Transcripts). Tell SNOW whether you submit a list of proteins or genes.
  5. Give a name to the job (optional).
  6. Press submit and wait until the results is finished. A normal job may last aproximately less than a minute but the time may vary depending on the size of the list.

1.4. Some constraints: Currently SNOW accepts lists in the range of 3 to 500 proteins/genes that can be mapped into the reference interactome.

1.5. Proteins/Gene IDs supported:

  • affy_focus
  • affy_hcg110
  • affy_hugene
  • affy_u133
  • affy_u95
  • agilent_cgh
  • agilent_probe
  • biocarta
  • biocyc
  • ccds
  • cisred
  • codelink
  • embl
  • ensembl_gene
  • ensembl_prot
  • ensembl_transcript
  • ensembl_transcript_same_CDS
  • entrezgene
  • genbank
  • havana_transcript
  • hugo
  • illumina_v1
  • illumina_v2
  • imgt/gene_db
  • imgt/ligm_db
  • ipi
  • locuslink
  • pdb
  • refseq_dna
  • refseq_dna_predicted
  • refseq_peptide
  • refseq_peptide_predicted
  • ucsc
  • unigene
  • uniprot/splicevariant id
  • uniprot/swissprot id
  • uniprot/trembl id

1.6. Some specifications: If you submit your own interactions, the MCN will not be compared versus same-size networks generated from random lists because this process is highly time-consuming and the distributions of the network parameters must be pre-calculated.

1.7. Output:

1.7.1. Statistic Results
  • Interactoma images. Boxplots of List distributions for the genes/proteins parameters mapped into the interactome versus parameters distributions in the interactome of reference. P-value for the Kolmogorov-Smirnov test.

  • Network images. Boxplots of List and random distributions for the Minimal Connected Networks generated from lists. P-value for the Kolmogorov-Smirnov test.
1.7.2. Network Functional information - List #1 results.
  1. BiComponents & subnetwork parameters.
  2. Componets & subnetwork parameters.
  3. Shortest paths found with a maximum of 1 external proteins introduced
  4. BiComponents & articulation points
1.7.3. Snow viewer - Minimal Connected Network visualization.

Users may view the network generated through a user friendly window that allows to manipulated the network and obtain functional information interactively.

Nodes belonging to the same component are colored with same color. The color-intensity of the nodes within each component means the centrality of the node within the complete MCN. Higher intensity corrspond to higher betweenness centrality. The size of the nodes mean nothing, it is just a matter of visualization due to label-lenght variability.

The applet has several options to facilitate the exploration of the MCN, some examples are the posibility of hiding nodes or edges that can be restored afterwards (show/hide nodes/edges option), gene/protein names can be shown or hidden, the dynamical layout can be switched off to move the nodes as we more like, etc. Here is the legend (under info bottom) with some help on visualization.

2. Some datasets to run SNOW

Here are several examples of lists of genes selected to differentiate two samples in microarray experiments. The description of the experiment is given.

The SNOW parameters used to perform the analyses were:

  • Interactome of reference: ppis detected by two methods.
  • Maximum number of external proteins: 1
  • Nature of the lists: Genes

Donwload the lists and perform your own SNOW analyses choosing same or different parameters. For a reference we give the results pages as you will obtain them, have a look at them and compare them with SNOW results using different parameters taking into account that results shown here may have been run with different version of ppi data.

Example numberDatasetDescription
2.1brca1_overexp_up Upregulated by induction of exogenous BRCA1 in EcR-293 cells
2.2brca1_overexp_dn Downregulated by induction of exogenous BRCA1 in EcR-293 cells
2.3serum_fibroblast_cellcycle Cell-cycle dependent genes regulated following exposure to serum in a variety of human fibroblast cell lines
2.4ageing_brain_dn Age-downregulated in the human frontal cortex
2.5brca1_sw480_up Up-regulated by infection of human colon adenocarcinoma cells (SW480) with Ad-BRCA1, versus Ad-LacZ control
2.6et743_resist_dn Down-regulated in two Et-743-resistant cell lines (chondrosarcoma and ovarian carcinoma) compared to sensitive parental lines
2.7hematop_stem_all_up Up-regulated in populations of human hematopoietic stem cells (CD34+/CD38-/Lin-) from bone marrow, umbilical cord blood, and peripheral blood stem-progenitor cells, compared to the stem cell-depleted population (CD34+/[CD38/Lin++])
2.8oldage_dn Downregulated in fibroblasts from old individuals, compared to young
2.9brca2_brca1_up Genes up-regulated in BRCA2-linked breast tumors, relative to BRCA1-linked tumors
2.10hdaci_colon_cur2hrs_up Upregulated by curcumin at 2 hrs in SW260 colon carcinoma cells
2.11p21_p53_any_dn Down-regulated at any timepoint (4-24 hrs) following ectopic expression of p21 (CDKN1A) in OvCa cells, p53-dependent

3. Using your own protein-protein interactions

SNOW gives the facility of using your own ppi dataset as the backgroung interactome for the analysis. You may submit ppi data in .sif or tabulated format (see examples below).

When using own ppi data, SNOW calculates the topological parameters of the complete dataset of ppis given by the user. The list of proteins/genes submitted by the user is tested to check whether it is enriched in hubs, central nodes or well-interconnected areas in comparison to the whole dataset.

SNOW generates the MCN of the list of proteins/genes submitted and presents its functional annotation. The comparison of the MCN topological parameters versus a set of random lists is not done when using own interactions due to computational constraints. The generatuion of 10,000 MCNs can last one or two days.

To show an example of a SNOW analysis using your own interactions, we have generated a .sif file with all protein-protein interactions from the complete collection of KEGG signalling pathways . This dataset may represents a subset of the interactome concentrated in proteins associated with the signalling machinery of the cell.

The lists used for this examples where extracted from an study that gets essential genes in different types of cancers ( Luo B. et al., 2008). An SNOW analysis of this set of lists using as interactome the signalling pathways determines the role of these lists within the signalling machinery of the cell.

Example number ppi dataset (.sif)ListDescription
3.1kegg_signalling_pathways.sif UL2 200 essential genes in Glioblastoma (UL2 cell line)
3.2kegg_signalling_pathways.sifH1975 200 essential genes in Non-Small-cell Lung cancer (H1975 cell line)

4. Two lists analysis

4.1. Scenario: You have two lists of proteins or genes that you have selected for a particular reason, for instance they are the result of a differential expression analysis of a microarray experiment (over and under expressed), they are the spots identifyed as differentiate two samples in a two-dimensional gel electrophoresis or they have an interesting pattern of expression along several samples in a trancriptomic analysis.

Now you want to compare both lists to see how different they are in terms of the internal structure that their physical interactions conform.

4.2. What can you get using SNOW?: SNOW can map the genes/proteins in your lists into the human interactome and extract the important ones as well as compare them in terms of connectivity, centrality and clustering coefficient within the whole interactome.

Furthermore, SNOW will calculate a minimal connected network from both lists and then compare their topology to check which one is more structured and in which terms.

4.3. Parameters to choose In the SNOW web from you can choose between performing analysis of one or two lists, choose two lists tab and you will find the following options (see image below)

  1. Select interactome: Choose the interactome you want to use in the analysis. There are two possibilities: a non-curated interactome (all ppis) and a curated interactome (ppis detected by at least two methods). See SNOW help for more information about their generation. If you have your own interactomic data you can submit your own interactions in the Paste your own interactions box in tabulated or .sif (cytoscape compatible) format.
  2. Proteins in interactions and lists in same id?: Only if you submit your own interactions, please tell us whether you submit interactomic data and your list with same identifyer type.
  3. Max number of external proteins introduced: This option (Max number of external proteins introduced) is to tell SNOW how it should generate the Minimal Connected Network (MCN). This network is generated calculating the shortest paths among all the pairs of proteins/genes in your list. Only some of the shortest path will be added to the MCN, the ones that join two elements in the list directly and the ones that join two elements by a determined number of nodes. Select this external number of nodes from 0 to 3.
  4. Insert the lists of genes or proteins. The format is simple, one element per line, see the input files of the exercises at the end of this tutorial.
  5. Submit lists of proteins or genes: Tell SNOW whether you submit a list of proteins or genes.
  6. Give the job a name (optional).
  7. Press submit and wait until the results is finished. A normal job may last aproximately less than a minute but the time may vary depending on the size of the list.

4.4. Two lists analysis output

4.4.1. Statistic Results
  • Interactoma images. Boxplots of both Lists distributions for the genes/proteins parameters mapped into the interactome of reference. P-value for the Kolmogorov-Smirnov test.

  • Network images. Boxplots of both Lists distributions for the Minimal Connected Networks generated. P-value for the Kolmogorov-Smirnov test.

4.4.2. Network Functional information

Set of tables to obtain the maximum of information about minimal connected networks functionality. One set of tables per each list. Functional information about proteins/genes in the list, and also about the ones introduced by Snow. Shortests paths within the network Components, bicomponents and articulation points functional information. Topological & Functional information

  1. BiComponents & subnetwork parameters.
  2. Componets & subnetwork parameters.
  3. Shortest paths found with a maximum of 1 external proteins introduced
  4. BiComponents & articulation points
4.4.3. Snow viewer - Minimal Connected Network visualization.

Users may view the networks generated through a user friendly window that allows to manipulated the network and obtain functional information interactively.

Nodes belonging to the same component are colored with same color. The color-intensity of the nodes within each component means the centrality of the node within the complete MCN. Higher intensity corrspond to higher betweenness centrality. The size of the nodes mean nothing, it is just a matter of visualization due to label lenghtvariability.

The applet has several options to facilitate the exploration of the MCN, some examples are the posibility of hiding nodes or edges that can be restored afterwards (show/hide nodes/edges option), gene/protein names can be shown or hidden, the dynamical layout can be switched off to move the nodes as we more like, etc. Here is the legend (under info bottom) with some help on visualization.

snow.txt · Last modified: 2010/05/31 13:00 by jsantoyo
Driven by DokuWiki Recent changes RSS feed Valid XHTML 1.0 do yourself a favour and use a real browser - get firefox!!