STEP 2
In this step datasets are checked for possible normalization issues and batch effects on a one-to-one basis. Batch effects are assessed by analysing each sample's distribution using the Shapiro-Wilk test for normality and checking whether the center of each distribution is similar to the other samples; most cases did not require any additional normalization (pre-processing steps in respective publications described in Supplementary Table S1), only Geiger et al. (2012) and Guo et al. (2012) required some additional quantile-normalization to account for slight sample deviations from normal distributions.
This step additionally involves normalizing the abundances of complex-associated subunits to the trimmed mean of the complexes (as previosuly described in Ori et al., 2013; Ori et al., 2016). Briefly, proteins belonging to the same complex were normalized by the respective trimmed mean (or interquartile mean) of the complex subunits across all individuals/samples. In case of proteins involved in multiple complexes, the average value from all the corresponding complexes was taken into account. Given the complex-normalized abundances, the variance of each subunit in a given complex was calculated. The output-directory of this step summarizes these results.
proteomics data from Battle et al. (2015), Science (Human Individuals)
dataset_battle_ribo_remapped.tsv.gz (27MB)cribosome profiling data from Battle et al. (2015), Science (Human Individuals)
dataset_battle_rna_remapped.tsv.gz (28MB)RNAseq data from Battle et al. (2015), Science (Human Individuals)
dataset_gygi1_remapped.tsv.gz (9MB)proteomics data from Chick et al. (2016), Nature (Founder Mouse strains, MS-proteomics)
dataset_gygi2_remapped.tsv.gz (25MB)RNAseq data from Chick et al. (2016), Nature (DO Mouse strains, RNAseq)
dataset_gygi3_remapped.tsv.gz (12.5MB)proteomics data from Chick et al. (2016), Nature (DO Mouse strains, MS-proteomics)
dataset_mann_all_log2_remapped.tsv.gz (15MB)proteomics data from Geiger et al. (2012), Mol Cell Proteomics (Human Cell Types)
dataset_tiannan_remapped.tsv.gz (4.8MB)proteomics data from Guo et al. (2012), Nature Medicine (Human Kidney Cells)
dataset_tcga_breast_remapped.tsv.gz (9.6MB)proteomics data from Mertins et al. (2016), Nature (TCGA Breast Cancer)
dataset_coloCa_remapped.tsv.gz (4.3MB)proteomics data from Roumeliotis et al. (2017),Cell (TCGA Colorectal Cancer)
dataset_tcga_ovarian_remapped.tsv.gz (11MB)proteomics data from Zhang et al. (2016), Cell (TCGA Ovarian Cancer)
dataset_bxdMouse_remapped.tsv.gz (2MB)proteomics data from Williams et al. (2016), Science (BXD Mouse Strains)
Python code required for checking normalization of datasets and complex-based normalisation.
Result files after filtering for complex-related proteins only.
complex_stoichiometry.zip (52.6MB)Result files after filtering for complex-related proteins normalized to complex.