Impact on Proteotype: Pipeline STEP1

STEP 1

Module- and ID-mapping to datasets

In this step original datasets derived from the publications illustrated in the Computational Pipeline are mapped to protein IDs (uniprot, Ensembl, Gene Symbol, ...) and modules. Protein-protein interactions were obtained from the STRING database (version 10.5); interactions were considered to exist if the (STRING) combined score > 0, to be confident if the combined score > 0.5 (Figure S1), and high-confidence interactions if the combined score > 0.7. The database of complexes was manually compiled and curated from COMPLEAT and CORUM by Ori et al., 2016, and quantified proteins from all published datasets considered for the analysis were mapped accordingly. Pathways were obtained from the Reactome Pathway Database (downloaded in February, 2017, http://reactome.org/download-data/). Cellular locations were extracted from the Human Protein Atlas (downloaded February 2017, Uhlen et al., 2015]) considering protein mappings only if this assignment has been either validated, supported or confirmed by antibody analysis (keyword ‘approved’). Chromosome locations were mapped using the Python package mygene using the hg19 GenBank assembly for human and the mm10 genome assembly for mice, respectively. Finally, essentiality of genes was defined based on the genetic screen performed in the human cell lines KBM7, K562, Jiyoye, and Raji by Wang et al., 2015; genes with a housekeeping role were obtained from the supplementary files of the report by Eisenberg & Levanon, 2013.