Inference of the Dynamic Aging-related Biological Subnetwork via Network Propagation

 

Contact: Tijana Milenkovic, tmilenko AT nd DOT edu

Reference: Khalique Newaz and Tijana Milenkovic (2020), Inference of the Dynamic Aging-related Biological Subnetwork via Network Propagation, submitted.

Summary: Gene expression (GE) data capture valuable condition-specific information ("condition" can mean a biological process, disease stage, age, patient, etc.) However, GE analyses ignore physical interactions between gene products, i.e., proteins. Since proteins function by interacting with each other, and since biological networks (BNs) capture these interactions, BN analyses are promising. However, current BN data fail to capture condition-specific information. Recently, GE and BN data have been integrated using network propagation (NP) to infer condition-specific BNs. However, existing NP-based studies result in a static condition-specific subnetwork, even though cellular processes are dynamic. A dynamic process of our interest is human aging. We use prominent existing NP methods in a new task of inferring a dynamic rather than static condition-specific (aging-related) subnetwork. Then, we study evolution of network structure with age - we identify proteins whose network positions significantly change with age and predict them as new aging-related candidates. We validate the predictions via e.g., functional enrichment analyses and literature search. Dynamic network inference via NP yields higher prediction quality than the only existing method for inferring a dynamic aging-related BN, which does not use NP.

Data: We integrate age-specific gene expression data with static PPI network data, to generate dynamic aging-related subnetworks.

  • Aging-related gene expression data from Berchtold et al. (2008).

  • Static PPI data from HPRD. The processed HPRD PPI data used in this study can be downloaded from here.

Software:

  • Computing gene expression values:

    • Given the gene expression data, we use MAS 5.0 software package in R for computing gene expression values.

    • We convert probe IDs to Gene symbols using a mapping mapping file downloaded from the DAVID tool.

    • The log-transformed gene expression data that we use in our study can be downloaded from here. There are 37 text files corresponding to 37 different ages. Each file has three columns representing, from left to right, gene symbol, log-transformed gene expression value, and an indicator (P/A) indentifying whether a gene is statistically significantly (p-value <= 0.04) expressed or not.

    • In order to determine if a gene is significantly expressed at a given age, we use an already published data from here. For each age, this data gives information about the p-values for each gene across different probe-sample combinations. In orde to determine if a gene is significantly expressed at a given age we follow the "majority vote rule". Let gene g has x probes in a sample and let there be y samples at age A. Then, according to the majority call of 0.5, gene g will be considered as expressed at age A if more than 0.5 fraction (or 50%) of x * y probes are found to be significantly expressed (p-value <= 0.04) at age A.

  • Generating dynamic aging-related subnetworks:

    • Given the gene expression data and the PPI network data, we use the Induced subgraph approach, HotNet2 and NetWalk to create dynamic aging-related subnetworks. Using the induced approach, we integrate the significantly expressed genes with the PPI network data to obtain a dynamic aging-related subnetwork. Using each of HotNet2 and NetWalk, we integrate the gene expression data with the static PPI network data in several different ways, resulting in one dynamic subnetwork with respect to HotNet2 and three dynamic subnetworks with respect to NetWalk. Thus, we create five different dynamic aging-related subnetworks.

    • All of the five dynamic aging-related subnetworks can be downloaded from here.

  • Analysis of dynamic aging-related subnetworks:

    • We use an established framework for dynamic network analysis.

    • We provide a modified version of the original unix implementation, which can be downloaded from here. The usage guideline for this modified software is provided below. In the modified software, the only input is a dynamic subnetwork and output are the genes predicted by dynamic network analysis of the input dynamic subnetwork.

    • Usage: ./generate-predictions.sh [detection_pv_threshold] [majority_call] [correlation_type] [rand_run] [min_ages_to_be_expressed] [pv_threshold] [output_dir]

      • [detection_pv_threshold] is the detection p-value threshold to determine an expression gene from the gene expression data. Since we use an already published data that gives information about the p-values (see above), we do not use this paraneter in our modified version of the software. The modified code uses a constant value of 0.04.

      • [majority_call] is the parameter indicating a "majority vote rule". Since we already preprocess our data (see above) prior to its use in the modified software, we do not use this parameter in our modified software. The code uses a constant value of 0.5.

      • [correlation_type] indicates the correlation measure used in the program. The choices are P (Pearson correlation) and S (Spearman's correlation). We use P for this study.

      • [rand_run] is the number of random permutations for computing the p-value of aging-related predictions from dynamic subnetworks. We performed 999,999 random permutations in this study.

      • [min_ages_to_be_expressed] is the parameter to filter genes that are unexpressed in most of the ages. A value x for this parameter will never consider consider a gene to be aging-related if the gene is expressed in fewer than x different ages. With the goal to filter the genes that are unexpressed in fewer than 20% of the ages, we use the value 8 for this parameter (because we had 37 different ages in our analysis).

      • [pv_threshold] is the p-value threshold to determine aging-related predictions. We used 0.01 p-value threshold in this study.

      • [output_dir] is the directory that will contain all the outputs.

      • Note. [output_dir]/intermediate-files/dynamic-ppis should contain the dynamic subnetwork that is being analyzed. The modified software contains all of the input dynamic subnetworks that we use in this study.

    • Example: Given a dynamic aging-related subnetwork, the aging-related predictions can be computed by the following command

      • ./generate-predictions.sh 0.04 0.5 P 999999 8 0.01 NetWalk/Output

      • The command will generate aging-related predictions based on NetWalk-based dynamic aging-related subnetwork, pearson correlation measure, 999,999 random permutations, genes expressed in at least 8 ages, and 0.01 p-value threshold. The computed aging-related predictions will be saved in "NetWalk/Output/aging-predictions".