Contact: Tijana
Milenkovic, tmilenko AT nd DOT edu
Reference: Khalique Newaz and Tijana Milenkovic (2020), Inference of the Dynamic Agingrelated
Biological Subnetwork via Network Propagation, submitted.
Summary: Gene expression (GE) data capture valuable conditionspecific information ("condition" can mean a biological process, disease stage, age, patient, etc.) However, GE analyses ignore physical interactions between gene products, i.e., proteins. Since proteins function by interacting with each other, and since biological networks (BNs) capture these interactions, BN analyses are promising. However, current BN data fail to capture conditionspecific information. Recently, GE and BN data have been integrated using network propagation (NP) to infer conditionspecific BNs. However, existing NPbased studies result in a static conditionspecific subnetwork, even though cellular processes are dynamic. A dynamic process of our interest is human aging. We use prominent existing NP methods in a new task of inferring a dynamic rather than static conditionspecific (agingrelated) subnetwork. Then, we study evolution of network structure with age  we identify proteins whose network positions significantly change with age and predict them as new agingrelated candidates. We validate the predictions via e.g., functional enrichment analyses and literature search. Dynamic network inference via NP yields higher prediction quality than the only existing method for inferring a dynamic agingrelated BN, which does not use NP.
Data: We integrate agespecific gene expression data with static PPI network data, to generate dynamic agingrelated subnetworks.
Software:
Computing gene expression values:
Given the gene expression data, we use MAS 5.0 software package in R for computing gene expression values.

We convert probe IDs to Gene symbols using a mapping mapping file downloaded from the DAVID tool.

The logtransformed gene expression data that we use in our study can be downloaded from here. There are 37 text files corresponding to 37 different ages. Each file has three columns representing, from left to right, gene symbol, logtransformed gene expression value, and an indicator (P/A) indentifying whether a gene is statistically significantly (pvalue <= 0.04) expressed or not.

In order to determine if a gene is significantly expressed at a given age, we use an already published data from here. For each age, this data gives information about the pvalues for each gene across different probesample combinations. In orde to determine if a gene is significantly expressed at a given age we follow the "majority vote rule". Let gene g has x probes in a sample and let there be y samples at age A. Then, according to the majority call of 0.5, gene g will be considered as expressed at age A if more than 0.5 fraction (or 50%) of x * y probes are found to be significantly expressed (pvalue <= 0.04) at age A.
Generating dynamic agingrelated subnetworks:
Given the gene expression data and the PPI network data, we use the Induced subgraph approach, HotNet2 and NetWalk to create dynamic agingrelated subnetworks. Using the induced approach, we integrate the significantly expressed genes with the PPI network data to obtain a dynamic agingrelated subnetwork. Using each of HotNet2 and NetWalk, we integrate the gene expression data with the static PPI network data in several different ways, resulting in one dynamic subnetwork with respect to HotNet2 and three dynamic subnetworks with respect to NetWalk. Thus, we create five different dynamic agingrelated subnetworks.

All of the five dynamic agingrelated subnetworks can be downloaded from here.
Analysis of dynamic agingrelated subnetworks:
We use an established framework for dynamic network analysis.

We provide a modified version of the original unix implementation, which can be downloaded from here. The usage guideline for this modified software is provided below. In the modified software, the only input is a dynamic subnetwork and output are the genes predicted by dynamic network analysis of the input dynamic subnetwork.
Usage: ./generatepredictions.sh [detection_pv_threshold] [majority_call] [correlation_type] [rand_run] [min_ages_to_be_expressed] [pv_threshold] [output_dir]

[detection_pv_threshold] is the detection pvalue threshold to determine an expression gene from the gene expression data. Since we use an already published data that gives information about the pvalues (see above), we do not use this paraneter in our modified version of the software. The modified code uses a constant value of 0.04.

[majority_call] is the parameter indicating a "majority vote rule". Since we already preprocess our data (see above) prior to its use in the modified software, we do not use this parameter in our modified software. The code uses a constant value of 0.5.

[correlation_type] indicates the correlation measure used in the program. The choices are P (Pearson correlation) and S (Spearman's correlation). We use P for this study.

[rand_run] is the number of random permutations for computing the pvalue of agingrelated predictions from dynamic subnetworks. We performed 999,999 random permutations in this study.

[min_ages_to_be_expressed] is the parameter to filter genes that are unexpressed in most of the ages. A value x for this parameter will never consider consider a gene to be agingrelated if the gene is expressed in fewer than x different ages. With the goal to filter the genes that are unexpressed in fewer than 20% of the ages, we use the value 8 for this parameter (because we had 37 different ages in our analysis).

[pv_threshold] is the pvalue threshold to determine agingrelated predictions. We used 0.01 pvalue threshold in this study.

[output_dir] is the directory that will contain all the outputs.

Note. [output_dir]/intermediatefiles/dynamicppis should contain the dynamic subnetwork that is being analyzed. The modified software contains all of the input dynamic subnetworks that we use in this study.
Example: Given a dynamic agingrelated subnetwork, the agingrelated predictions can be computed by the following command

./generatepredictions.sh 0.04 0.5 P 999999 8 0.01 NetWalk/Output

The command will generate agingrelated predictions based on NetWalkbased dynamic agingrelated subnetwork, pearson correlation measure, 999,999 random permutations, genes expressed in at least 8 ages, and 0.01 pvalue threshold. The computed agingrelated predictions will be saved in "NetWalk/Output/agingpredictions".
