Network analysis of synonymous codon usage Milenkovic Lab |
||||||||
|
||||||||
|
||||||||
Network analysis of synonymous codon usage | ||||||||
Contact: Tijana Milenkovic, tmilenko AT nd DOT edu Introduction: Most amino acids are encoded by multiple synonymous codons. For an amino acid, some of its synonymous codons are used much more rarely than others. Analyses of positions of such rare codons in protein sequences revealed that rare codons can impact co-translational protein folding and that positions of some rare codons are evolutionarily conserved. Analyses of positions of rare codons in proteins’ 3-dimensional structures, which are richer in biochemical information than sequences alone, might further explain the role of rare codons in protein folding. We analyze a protein set recently annotated with codon usage information, considering non-redundant proteins with sufficient structural information. We model the proteins’ structures as networks and study potential differences between network positions of amino acids encoded by evolutionarily conserved rare, evolutionarily non-conserved rare, and commonly used codons. In 84% of the proteins, at least one of the three codon categories occupies significantly more or less network-central positions than the other codon categories. Many of the protein groups showing different codon centrality trends (i.e., different types of relationships between network positions of the three codon categories) are enriched in unique biological functions, implying a possible existence of a link between codon usage, protein folding, and protein function. Reference: Khalique Newaz, Gabriel Wright, Jacob Piland, Jun Li, Patricia Clark, Scott Emrich, and Tijana Milenkovic (2019), Network analysis of synonymous codon usage, submitted. Data set: Starting with a recent large data set consisting of ∼280,000 proteins spanning 76 species for which codon usage information is available, we consider a subset of these proteins that are non-redundant (at most 90% sequence-similar) to each other and that have sufficient 3-dimensional protein structural information in the Protein Data Bank, which results in 63 proteins spanning seven species. For each of the 63 proteins, we provide a mapping file and the corresponding protein structure network (PSN).
| ||||||||