Impact on Proteotype: Pipeline STEP7

STEP 7

Identification of stable and variable protein complexes

The database of complexes was manually compiled and curated from COMPLEAT and CORUM by Ori et al., 2016, and quantified proteins from all published datasets considered for the analysis were mapped accordingly. A subset of manually curated protein complexes were classified as ‘well-defined’ (see Ori et al., 2016). For further analysis only protein complexes with at least 5 quantified members were considered in each dataset, respectively. As a general principle, we used the median co-abundance of proteins within a complex as a proxy to differentiate between stable and variable complexes. To compare the extent of complex stability and variability, correlations were ranked within each dataset; finally the median rank of each complex as recovered from each considered dataset was calculated, and complexes were sorted accordingly. The top quantile (25%) of these complexes were considered to be highly stable (Pearson’s r > 0.46), whereas the lowest quantile were considered highly variable (Pearson’s r < 0.2).

To assess the consistency of the complex variability landscape, we calculated the Spearman correlation of the ranked median co-abundance across datasets (as illustrated in Figure 3). As a reference distribution we permuted the dataset 1000 times, and computed Spearman correlation coefficients across datasets each time. In a two-sided t-test we then compared the real distribution of correlation values with the ones derived from the random permutations of the dataset. This testing set-up does not presume any directionality in the hypothesis testing (two-sided) and is justified due to the normality of the reference distribution.

Data/Code Requirements for downloading

complex_filtered_battle_protein.tsv.gz (3MB)

complex-mapped and -filtered proteomics data from Battle et al. (2015), Science (Human Individuals)

complex_filtered_gygi1.tsv.gz (2.2MB)

complex-mapped and -filtered proteomics data from Chick et al. (2016), Nature (Founder Mouse strains, MS-proteomics)

complex_filtered_gygi3.tsv.gz (2.8MB)

complex-mapped and -filtered proteomics data from Chick et al. (2016), Nature (DO Mouse strains, MS-proteomics)

complex_filtered_mann.tsv.gz (3.7MB)

complex-mapped and -filtered proteomics data from Geiger et al.(2012), Mol Cell Proteomics (Human Cell Types)

complex_filtered_tcga_breast.tsv.gz (2.2MB)

complex-mapped and -filtered proteomics data from Mertins et al. (2016), Nature (TCGA Breast Cancer)

complex_filtered_tcga_color.tsv.gz (1.3MB)

complex-mapped and -filtered proteomics data from Roumeliotis et al. (2017),Cell (TCGA Colorectal Cancer)

complex_filtered_tcga_ovarian.tsv.gz (2.9MB)

complex-mapped and -filtered proteomics data from Zhang et al. (2016), Cell (TCGA Ovarian Cancer)

complex_dictionary.pkl (892MB)

pickle-dictionary containing complex information

Download all input data for this step here (278MB)

wp_step7_code.py

Python code required for calculating and visualizing protein co-variation landscape across datasets.

underlying_data_for_Figure3.zip (25MB)

Underlying data for Figure 3

Figure 3

Computational Pipeline

Computational Steps in Detail

STEP 7