Impact on Proteotype: Pipeline STEP12

STEP 12

Effect size estimations of sex- and diet on proteins and modules

To understand to what extent both protein complex abundance and stoichiometry are affected by either sex or diet, a L2-regularized Multiple Linear Regression (Ridge regression with a regularization parameter of 1) was used, as implemented in the scikit Python package (http://scikit-learn.org; Pedregosa et al., 2011). We compared models that predict complex abundance or complex stoichiometries, using as predictors: (i) genetic sex, (ii) diet, and (iii) the combination of genetic sex and diet together. We assessed the quality of each model by the coefficient of determination (R²). This was done for every module considered (complexes and pathways), for abundance, as well as module-normalized data. For pathways we only considered those that were showing a high co-abundance (FDR-corrected p-value < 0.1) as compared to co-abundances derived from a reshuffled dataset.

To estimate prediction performance we used a 10-fold cross-validation scheme. Briefly, we randomly separated the dataset (per complex) into ten groups of equal size, in order to iteratively train a model with nine of them, and to assess the testing performance in the held-out group. For each module the median global R² is reported. The same analysis was conducted with a reshuffled dataset per complex, and the corresponding performance metrics were used in a permutation test approach to assign significance to the true ridge regression coefficients. Specifically, we use the latter distribution to calculate an empirical FDR. Throughout the main text, the global R² performance metric derived for the module or protein is reported as the effect size with its respective FDR-corrected p-value.

Data/Code Requirements for downloading

split_complex_input_files.zip (4.6MB)

input files for the machine-learning setup, split complexes

complex_quant_dictionary.json (14KB)

json-dictionary for complex identifiers (with complex subunit abundances)

complex_stoichiometry_dictionary.json (14KB)

json-dictionary for complex identifiers (with complex-normalized abundances)

Download all input data for this step here (4.6MB)

wp_step12_code.py

Python code required for machine-learning setup on identifying explained variances.

multivariate.zip (5.5MB)

temporary output-files from multivariate analysis