STEP 1

In this step original datasets derived from the publications illustrated in the Computational Pipeline are mapped to protein IDs (uniprot, Ensembl, Gene Symbol, ...) and modules. Protein-protein interactions were obtained from the STRING database (version 10.5); interactions were considered to exist if the (STRING) combined score > 0, to be confident if the combined score > 0.5 (Figure S1), and high-confidence interactions if the combined score > 0.7. The database of complexes was manually compiled and curated from COMPLEAT and CORUM by Ori et al., 2016, and quantified proteins from all published datasets considered for the analysis were mapped accordingly. Pathways were obtained from the Reactome Pathway Database (downloaded in February, 2017, http://reactome.org/download-data/). Cellular locations were extracted from the Human Protein Atlas (downloaded February 2017, Uhlen et al., 2015]) considering protein mappings only if this assignment has been either validated, supported or confirmed by antibody analysis (keyword ‘approved’). Chromosome locations were mapped using the Python package mygene using the hg19 GenBank assembly for human and the mm10 genome assembly for mice, respectively. Finally, essentiality of genes was defined based on the genetic screen performed in the human cell lines KBM7, K562, Jiyoye, and Raji by Wang et al., 2015; genes with a housekeeping role were obtained from the supplementary files of the report by Eisenberg & Levanon, 2013.

original_datasets.zip (49MB)

string_data_original_files.zip (561MB)

original STRING interaction files for human and mice, downloaded from STRING v10.5

string_data.zip (309MB)

(temporary) STRING interaction data for human and mice, v10.5

tcga_breast_metafiles.zip (36.7MB)

tcga_ovarian_metafiles.zip (47.5MB)

housekeeping_genes.txt (70KB)

Housekeeping genes as defined by Eisen & Levanon (2013), Trends in Genetics

essentiality_genes.txt (2MB)

Essential genes as defined by Wang et al.(2015),Science

complex_dictionary.pkl (872MB)

pickle-dictionary containing complex information (752MB)

subcellular_location.csv (1.5MB)

Subcellular location as defined by the Human Protein Atlas Uhlen et al.(2015),Science

Download all input data for this step here (1.25GB)


wp_step1_code.py

Python code required for mapping IDs and modules.


output_step1.zip (186MB)

Result files from mapping IDs and modules to datasets