STEP 12

To understand to what extent both protein complex abundance and stoichiometry are affected by either sex or diet, a L2-regularized Multiple Linear Regression (Ridge regression with a regularization parameter of 1) was used, as implemented in the scikit Python package (http://scikit-learn.org; Pedregosa et al., 2011). We compared models that predict complex abundance or complex stoichiometries, using as predictors: (i) genetic sex, (ii) diet, and (iii) the combination of genetic sex and diet together. We assessed the quality of each model by the coefficient of determination (R2). This was done for every module considered (complexes and pathways), for abundance, as well as module-normalized data. For pathways we only considered those that were showing a high co-abundance (FDR-corrected p-value < 0.1) as compared to co-abundances derived from a reshuffled dataset.

To estimate prediction performance we used a 10-fold cross-validation scheme. Briefly, we randomly separated the dataset (per complex) into ten groups of equal size, in order to iteratively train a model with nine of them, and to assess the testing performance in the held-out group. For each module the median global R2 is reported. The same analysis was conducted with a reshuffled dataset per complex, and the corresponding performance metrics were used in a permutation test approach to assign significance to the true ridge regression coefficients. Specifically, we use the latter distribution to calculate an empirical FDR. Throughout the main text, the global R2 performance metric derived for the module or protein is reported as the effect size with its respective FDR-corrected p-value.

split_complex_input_files.zip (4.6MB)

input files for the machine-learning setup, split complexes

complex_quant_dictionary.json (14KB)

json-dictionary for complex identifiers (with complex subunit abundances)

complex_stoichiometry_dictionary.json (14KB)

json-dictionary for complex identifiers (with complex-normalized abundances)

Download all input data for this step here (4.6MB)


wp_step12_code.py

Python code required for machine-learning setup on identifying explained variances.


multivariate.zip (5.5MB)

temporary output-files from multivariate analysis

randomized_multivariate.zip (6MB)

temporary output-files from multivariate analysis after reshuffling datasets

combined_multivariate.zip (2.9MB)

temporary output-files from multivariate analysis, combining co-variates

combined_randomized_multivariate.zip (2.5MB)

temporary output-files from multivariate analysis (combining co-variates) after reshuffling datasets

summary_files.zip (1.5MB)

summarized result files from multivariate/combined analysis

Dowload all output data for this step here (18.5MB)