STEP 6

For the Receiver Operating Curve (ROC) analysis across different types of modules in different datasets, condition positives were defined based on the different databases as outlined above. The lowest number of condition positives occurring is 1.540 (interactions). For pathways we excluded interactions within protein complexes (such as the ribosome complex). When considering chromosome location, we defined true positives “interactions” to exist between genes encoded on the same chromosome. For the categories essentiality and housekeeping role true positive interactions were to occur between essential genes and housekeeping genes, respectively. The full set of condition negatives consists of all other pairs of proteins. For computational reasons, we randomly sampled from the full set of condition negatives the same number of respective condition positives to compute ROC curves. The area under the curve (AUC) value was calculated using the trapezoidal rule. We applied Mann-Whitney U-statistics, which forms the basis for the AUC-calculations in the first place [Hanley et al., 1982; Mason et al. 2002], to test whether correlation values derived from proteins that are in the same modules, are significantly different from correlation values derived from random proteins that are not part of any modules. To make a conservative estimate of the effect size (and p-value), we applied the Mann-Whitney U-test 1000 times to a randomly sampled selection of 1000 items from the two distributions, respectively, and calculated the mean p-value.

dataset_battle_protein_remapped.tsv.gz (9MB)

proteomics data from Battle et al. (2015), Science (Human Individuals)

dataset_battle_ribo_remapped.tsv.gz (27MB)

cribosome profiling data from Battle et al. (2015), Science (Human Individuals)

dataset_battle_rna_remapped.tsv.gz (28MB)

RNAseq data from Battle et al. (2015), Science (Human Individuals)

dataset_gygi1_remapped.tsv.gz (9MB)

proteomics data from Chick et al. (2016), Nature (Founder Mouse strains, MS-proteomics)

dataset_gygi2_remapped.tsv.gz (25MB)

RNAseq data from Chick et al. (2016), Nature (DO Mouse strains, RNAseq)

dataset_gygi3_remapped.tsv.gz (12.5MB)

proteomics data from Chick et al. (2016), Nature (DO Mouse strains, MS-proteomics)

dataset_mann_all_log2_remapped.tsv.gz (15MB)

proteomics data from Geiger et al. (2012), Mol Cell Proteomics (Human Cell Types)

dataset_tiannan_remapped.tsv.gz (4.8MB)

proteomics data from Guo et al. (2012), Nature Medicine (Human Kidney Cells)

dataset_tcga_breast_remapped.tsv.gz (9.6MB)

proteomics data from Mertins et al. (2016), Nature (TCGA Breast Cancer)

dataset_coloCa_remapped.tsv.gz (4.3MB)

proteomics data from Roumeliotis et al. (2017),Cell (TCGA Colorectal Cancer)

dataset_tcga_ovarian_remapped.tsv.gz (11MB)

proteomics data from Zhang et al. (2016), Cell (TCGA Ovarian Cancer)

dataset_bxdMouse_remapped.tsv.gz (2MB)

proteomics data from Williams et al. (2016), Science (BXD Mouse Strains)

roc_temporary_files.zip

temporary files produced from ROC-analysis (13.5MB)

string_data.zip (301MB)

STRING interaction data for human and mice, v10.5

housekeeping_genes.txt (70KB)

Housekeeping genes as defined by Eisen & Levanon (2013), Trends in Genetics

essentiality_genes.txt (2MB)

Essential genes as defined by Wang et al.(2015),Science

complex_dictionary.pkl (872MB)

pickle-dictionary containing complex information (752MB)

Download all input data for this step here


wp_step6_code.py

Python code required for ROC-analysis.


underlying_data_for_Figure2a.zip (13.5MB)

Underlying data for Figure 2a

underlying_data_for_Figure2b.zip (5.1MB)

Underlying data for Figure 2b

Figure 2