Impact on Proteotype: Pipeline STEP6

STEP 6

Receiver Operating Curve (ROC) analysis

For the Receiver Operating Curve (ROC) analysis across different types of modules in different datasets, condition positives were defined based on the different databases as outlined above. The lowest number of condition positives occurring is 1.540 (interactions). For pathways we excluded interactions within protein complexes (such as the ribosome complex). When considering chromosome location, we defined true positives “interactions” to exist between genes encoded on the same chromosome. For the categories essentiality and housekeeping role true positive interactions were to occur between essential genes and housekeeping genes, respectively. The full set of condition negatives consists of all other pairs of proteins. For computational reasons, we randomly sampled from the full set of condition negatives the same number of respective condition positives to compute ROC curves. The area under the curve (AUC) value was calculated using the trapezoidal rule. We applied Mann-Whitney U-statistics, which forms the basis for the AUC-calculations in the first place [Hanley et al., 1982; Mason et al. 2002], to test whether correlation values derived from proteins that are in the same modules, are significantly different from correlation values derived from random proteins that are not part of any modules. To make a conservative estimate of the effect size (and p-value), we applied the Mann-Whitney U-test 1000 times to a randomly sampled selection of 1000 items from the two distributions, respectively, and calculated the mean p-value.

Data/Code Requirements for downloading

dataset_battle_protein_remapped.tsv.gz (9MB)

proteomics data from Battle et al. (2015), Science (Human Individuals)

dataset_battle_ribo_remapped.tsv.gz (27MB)

cribosome profiling data from Battle et al. (2015), Science (Human Individuals)

dataset_battle_rna_remapped.tsv.gz (28MB)

RNAseq data from Battle et al. (2015), Science (Human Individuals)

dataset_gygi1_remapped.tsv.gz (9MB)

proteomics data from Chick et al. (2016), Nature (Founder Mouse strains, MS-proteomics)

dataset_gygi2_remapped.tsv.gz (25MB)

RNAseq data from Chick et al. (2016), Nature (DO Mouse strains, RNAseq)

dataset_gygi3_remapped.tsv.gz (12.5MB)