Impact on Proteotype: Pipeline STEP8

STEP 8

Protein complex stoichiometry analysis

To assess compositional rearrangements of protein complexes as opposed to their overall abundance changes, a module-wise normalization was performed, as previously described [Ori et al., 2013; Ori et al., 2016]. Proteins belonging to the same complex were normalized by the respective trimmed mean (or interquartile mean) of the complex subunits across all individuals/samples. In case of proteins involved in multiple complexes, the average value from all the corresponding complexes was taken into account. Given the complex-normalized abundances, the variance of each subunit in a given complex was calculated. To compare these variances between different proteomics datasets and approaches, those variances were converted to z-scores per complex (Figure 4). Similarly to testing the consistency between datasets in the above section, we calculated the correlation coefficients (between datasets) for each such a z-score matrix. To compile a reference distribution we permuted each matrix and calculated corresponding correlation coefficients 1000 times, which provides a normal distribution. In a two-sided t-test we then compared the real distribution of correlation values with the ones derived from the random permutations of the dataset. Protein subunits within a complex were considered ‘stable’ or ‘variable’ in case of the associated p-value < 0.05 based on the distribution of z-scores (Table S3). To see whether a given protein is consistently ‘variable’ in a complex throughout all given datasets, the distribution of its z-scores within the complex and across all the datasets were compared to the z-score distribution for all other protein components of the same complex across all datasets (one-sided t-test). A one-sided t-test accounts for the unidirectionality of our hypothesis and gives conservative results. This procedure was done for all proteins in a given complex, and resulting p-values were adjusted using the Benjamini-Hochberg method [Benjamini and Hochberg, 1995].

Data/Code Requirements for downloading

complex_filtered_stoch_battle_protein.tsv.gz (3.7MB)

complex-normalized proteomics data from Battle et al. (2015), Science (Human Individuals)

complex_filtered_stoch_gygi1.tsv.gz (3.1MB)

complex-normalized proteomics data from Chick et al. (2016), Nature (Founder Mouse strains, MS-proteomics)

complex_filtered_stoch_gygi3.tsv.gz (5.8MB)

complex-normalized proteomics data from Chick et al. (2016), Nature (DO Mouse strains, MS-proteomics)

complex_filtered_stoch_mann.tsv.gz (4.5MB)

complex-normalized proteomics data from Geiger et al. (2012), Mol Cell Proteomics (Human Cell Types)

complex_filtered_stoch_tcgaBreast.tsv.gz (2.3MB)

complex-normalized proteomics data from Mertins et al. (2016), Nature (TCGA Breast Cancer)

complex_filtered_stoch_tcgaColoCancer.tsv.gz (1.2MB)

complex-normalized proteomics data from Roumeliotis et al. (2017),Cell (TCGA Colorectal Cancer)

complex_filtered_stoch_tcgaOvarian.tsv.gz (2.7MB)

complex-normalized proteomics data from Zhang et al. (2016), Cell (TCGA Ovarian Cancer)

figure4a_relevant_complexes.tsv (180KB)

relevant complexes for visualization purposes; Figure 4a

complex_dictionary.pkl (892MB)

pickle-dictionary containing complex information

Download all input data for this step here (289MB)

wp_step8_code.py

Python code required for defining variable and stable subunits within complexes.

underlying_data_for_Figure4.zip (941MB)

Underlying data for Figure 4

Figure 4

Computational Pipeline

Computational Steps in Detail

STEP 8