Current Research Topics:

Analysis of RNA-seq and single-cell RNA-seq data
Machine learning and data mining
Deep-learning neural networks
Feature extraction and classification on networks

Jun Li's research is supported by multiple NIH grants.

Current Students:

Cheng Liu, PhD, 2016 - present
Hongyu Guo, PhD, 2016 - present
Chuanqi Wang, PhD, 2016 - present
Zixuan Song, PhD, 2017 - present

Previous Students:

Martin Barron, PhD, 2013 - 2018
Alicia Specht, PhD, 2012 - 2017
Ge Jiang, Master's, graduated in 2017
Can Shao, Master's, graduated in 2014
Tingting Zhang, Master's, graduated in 2013

Publications:

Chuanqi Wang*, Lifu Xiao, Chen Dai, Anh H. Nguyen, Laurie E. Littlepage, Zachary D. Schultz, Jun Li** (2020) A Statistical Approach of Background Removal and Spectrum Identification for SERS Data Scientific Reports, accepted. (* Wang is a PhD student of Li; ** Li is the corresponding author.)
Lifu Xiao, Chuanqi Wang, Chen Dai, Laurie E Littlepage, Jun Li, Zachary D Schultz (2019) Untargeted Tumor Metabolomics with Liquid Chromatography—Surface-Enhanced Raman Spectroscopy, Angewandte Chemie, accepted.
Chuanqi Wang*, Jun Li** (2019) SINC: a scale-invariant deep-neural-network classifier for bulk and single-cell RNA-seq data, Bioinformatics, accepted. (* Wang is a PhD student of Li; ** Li is the corresponding author.)
Md Sazzad Hassan, Fiona Williams, Niranjan Awasthi, Margaret A. Schwarz, Roderich E. Schwarz, Jun Li, and Urs von Holzen. (2019) Combination effect of lapatinib with foretinib in HER2 and MET co-activated experimental esophageal adenocarcinoma, Scientific Reports, 9(1):17608.
Ricardo Romero-Moreno, Kimberly J. Curtis, Thomas R. Coughlin, MariaCristina Miranda-Vergara, Shourik Dutta, Aishwarya Natarajan, Beth A. Facchine, Kristen M. Jackson, Lukas Nystrom, Jun Li, William Kaliney, Glen L. Niebur, and Laurie E. Littlepage. (2019) The CXCL5/CXCR2 axis is sufficient to promote breast cancer colonization during bone metastasis, Nature Communications, 10(1):4404.
Cheng Liu*, Kyung T Han, Jun Li** (2019) Compromised Item Detection for Computerized Adaptive Testing, Frontiers in Psychology, 10: 829. (* Liu is a PhD student of Li; ** Li is the corresponding author.)
Jun Li*, Lamere AT (2019) DiPhiSeq: Robust comparison of expression levels on RNA-Seq data with large sample sizes, Bioinformatics, accepted. (* corresponding author.)
Kang DS*, Barron M*, Lovin DD, Cunningham JM, Eng MW, Chadee DD, Li J , Severson DW (2018) A transcriptomic survey of the impact of environmental stress on response to dengue virus in the mosquito, Aedes aegypti, Plos Neg. Trop Dis., 12(6): e0006568. (* joint first author, and Barron is a PhD student of Li)
Wei Qiu, Na Shang, Thomas Bank, Xianzhong Ding, Jun Li, Peter Breslin, and Baomin Shi (2018) Caspase-3 suppresses diethylnitrosamine-induced hepatocyte death, compensatory proliferation and hepatocarcinogenesis through inhibiting p38 activation, Cell Death & Disease, 9(5): 558.
Andrew Baker, Debra Wyatt, Maurizio Bocchetta, Jun Li, Aleksandra Filipovic, Andrew Green, Daniel Peiffer, Suzanne Fuqua, Lucio Miele, Kathy S. Albain, Clodia Osipo (2018) Notch-1-PTEN-ERK1/2 Signaling Axis Promotes HER2+ Breast Cancer Cell Proliferation and Stem Cell Survival, Oncogene, 37(33): 4489.
Md Sazzad Hassan, Niranjan Awasthi, Jun Li, Fiona Williams, Margaret A. Schwarz, Roderich E. Schwarz, and Urs von Holzen (2018) Superior therapeutic efficacy of nanoparticle albumin bound paclitaxel over cremophor-bound paclitaxel in experimental esophageal adenocarcinoma, Translational Oncology, 11(2): 426-435.
Zonggao Shi, Chunyan Li, Laura Tarwater, Jun Li, Yang Li, William Kaliney, Darshan Chandrashekar and M. Sharon Stack (2018), RNA-seq Reveals the Overexpression of IGSF9 in Endometrial Cancer, Journal of Oncology, 2018: 2439527.
Chen Dai, Jennifer Arceo, James Michael Arnold, Arun Sreekumar, Norman J Dovichi, Jun Li, Laurie Littlepage (2018), Metabolomics of oncogene-specific metabolic reprogramming during breast cancer, Cancer & Metabolism, Cancer & metabolism, 6(1): 5.
Martin Barron*, Siyuan Zhang, Jun Li** (2018) A sparse differential clustering algorithm for tracing cell type changes via single-cell RNA-sequencing data, Nucleic Acids Resl, 46(3): e14. (* Barron is a PhD student of Li; ** Li is the corresponding author.)
Faisal FE, Newaz K, Chaney JL, Li J, Emrich SJ, Clark PL, Milenkovic T (2017) GRAFENE: Graphlet-based alignment-free network approach integrates 3D structural and sequence (residue order) data to improve protein structural comparison, Scientific Reports, 7(1):14890
Chaney JL, Steele A, Carmichael R, Rodriguez A, Specht AT, Ngo K, Li J, Emrich S, Clark PL (2017) Widespread position-specific conservation of synonymous rare codons within coding sequences, PLoS Comput Biol, 13(5):e1005531.
Md Sazzad Hassan, Niranjan Awasthi, Jun Li, Margaret A. Schwarz, Roderich E. Schwarz, Urs von Holzen (2017) A Novel Intraperitoneal Metastatic Xenograft Mouse Model for Survival Outcome Assessment of Esophageal Adenocarcinoma, PLoS ONE, 12(2):e0171824.
Alicia Specht* and Jun Li** (2017) LEAP: Constructing gene co-expression networks for single-cell RNA-sequencing data using pseudotime ordering, Bioinformatics, 33 (5): 764-766. (* Specht is a PhD student of Li; ** Li is the corresponding author.)
Qian Yang, Yue Hu, Jun Li, Xuegong Zhang (2017) ulfasQTL: an ultra-fast method of composite splicing QTL analysis, BMC Genomics, 18(Suppl 1):963.
Martin Barron* and Jun Li** (2016) Identifying and removing the cell-cycle effect from single-cell RNA-Sequencing data, Scientific Reports, 6:33892. (* Barron is a PhD student of Li; ** Li is the corresponding author.)
Rispah T. Sawe, Maggie Kerper, Sunil Badve, Jun Li, Mayra Sandoval-Cooper, Jingmeng Xie, Zonggao Shi, Kirtika Patel, David Chumba, Ayub Ofulla, Jenifer Prosperi, Katherine Taylor, M. Sharon Stack, Simeon Mining and Laurie E. Littlepage (2016) Aggressive breast cancer in western Kenya has early onset, high proliferation, and immune cell infiltration, BMC Cancer, 16:204.
Alicia Specht* and Jun Li** (2015) Estimation of gene co-expression from RNA-Seq count data, Statistics and Its Interface, 8(4):507-515. (* Specht is a PhD student of Li; ** Li is the corresponding author.)
Can Shao*, Jun Li**, and Ying Cheng (2015) Detection of Test Speededness Using Change-Point Analysis, Psychometrika, Accepted. (* Shao is master's student of Li; Li is a corresponding author.)
Lynn Roy, Serene J Samyesudhas, Martin Carrasco, Jun Li, Stancy Joseph, Richard Dahl, and Karen D Cowden Dahl (2014) ARID3B increases ovarian tumor burden and is associated with a cancer stem cell gene signature, Oncotarget, 5 (18): 8355-8366.
Miranda Burnette, Teresa Brito-Robinson, Jun Li*, and Jeremiah Zartman (2014) An inverse small molecule screen to design a chemically defined medium supporting long-term growth of Drosophila cell lines, Molecular BioSystems, 10(10): 2713-2723. (*: Li is a corresponding author.)
Alayne Brunner*, Jun Li*, Xiangqian Guo, Sushama Varma, Shirley Zhu, Rui Li, Robert Tibshirani, and Robert B West (2014) A shared transcriptional program in early breast neoplasias despite genetic and clinical distinctions, Genome Biology, 15(5): R71. (* joint first authors)
Chunlei Li, Jun Li, Holly V Goodson, and Mark S Alber (2014) Microtubule dynamic instability: the role of cracks between protofilaments, Soft Matter, 10(20): 2069-2080.
Jun Li* and Robert Tibshirani (2013) Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Statistical Methods in Medical Research, 22(5): 519-36. (*: Li is the corresponding author.)
        Associated software: SAM (samr) and npseq.
Wan Y, Qu K, Ouyang Z, Kertesz M, Li J, Tibshirani R, Makino DL, Nutter RC, Segal E, and Chang HY (2012) Genome-wide Measurement of RNA Folding Energies, Molecular Cell, 48(2):169-81.
Jun Li*, Daniela M Witten, Iain Johnstone, and Robert Tibshirani (2012) Normalization, testing, and false discovery rate estimation for rna-sequencing data. Biostatistics, 13(3):523-38. (*: Li is the corresponding author.)
        Associated software: PoissonSeq.
Lewis Z Hong, Jun Li, Anne Schmidt-Kuntzel, Wesley C. Warren, and Gregory S. Barsh (2011) Digital gene expression for non-model organisms. Genome Research, 21(11): 1905–1915.
Jun Li, Hui Jiang, and Wing H Wong (2010) Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biology 11(5): R50.
        Associated software: mseq.
Jun Li, Michael Q Zhang, Xuegong Zhang (2006) A New Method for Detecting Human Recombination Hotspots and Its Applications to the HapMap ENCODE Data. The American Journal of Human Genetics 79:628-639.
        Associated software: HotspotFisher.
Jing Zhang, Fei Li, Jun Li, Michael Q Zhang, Xuegong Zhang (2004) Evidence and characteristics of putative human alpha recombination hotspots. Human Molecular Genetics 13:2823-2828.

Softwares:

sabarsi: Statistical Approach of BAckground Removal and Spectrum Identification for SERS Data

        Description: Implements the algorithm described in Wang C et al, "A Statistical Approach of Background Removal and Spectrum Identification for SERS Data" (Scientific Reports 2020). Sabarsi is a pipeline for SERS (surface-enhanced Raman scattering) data analysis that includes background removal, signal detection, signal integration, and cross-experiment comparison. The background removal algorithm, the very first step of SERS data analysis, takes into account the change of background shape.
        Availability: The R version is available on CRAN. You can type install.packages("sabarsi") on R command to install it.

SINC: Scale-invariant deep-neural-network based classfication of RNA-seq and single-cell RNA-seq data

        Description: Implements the algorithm described in Wang C, Li J "SINC: a scale-invariant deep-neural-network classifier for bulk and single-cell RNA-seq data" (Bioinformatics 2019). Different from other classficiation algorithms for RNA-seq data, SINC gives the same results under any normalization (estimation of sequencing depths), and thus it does not require normalization at all.
        Availability: Python source code and documents.

DiPhiSeq: Robust comparison of expression levels

        Description: Implements the algorithm described in Li J, Lamere AT, "DiPhiSeq: Robust comparison of expression levels on RNA-Seq data with large sample sizes" (Bioinformatics 2019). This algorithm detects not only genes that show different average expressions ("differential expression", DE), but also genes that show different diversities of expressions in different groups ("differentially dispersed", DD). DD genes can be important clinical markers. DiPhiSeq uses a redescending penalty on the quasi-likelihood function, and thus has superior robustness against outliers and other noise.
        Availability: The R version is available on CRAN. You can type install.packages("DiPhiSeq") on R command to install it.

SparseDC: sparse differential clustering

        Description: Implements the algorithm described in Barron, M., Zhang, S. and Li, J. "A sparse differential clustering algorithm for tracing cell type changes via single-cell RNA-sequencing data" (Nucleic Acids Res. 2018). This algorithm clusters samples from two different populations, links the clusters across the conditions and identifies marker genes for these changes. The package was designed for scRNA-Seq data but is also applicable to many other data types, just replace cells with samples and genes with variables. The package also contains functions for estimating the parameters for SparseDC as outlined in the paper.
        Availability: The R version is available on CRAN. You can type install.packages("SparseDC") on R command to install it. Usage: The manual is available on CRAN.

ccRemover: cell-cycle remover

        Description: ccRemover detects and removes the cell-cycle effect from single-cell RNA-Seq data. The current method (scLVM) for removing the cell-cycle effect is unable to effectively identify this effect and has a high risk of removing other biological components of interest, compromising downstream analysis. ccRemover is a new method that reliably identifies the cell-cycle effect and removes it. ccRemover preserves other biological signals of interest in the data and thus can serve as an important pre-processing step for many scRNA-Seq data analyses.
        Availability: The R version is available on CRAN. You can type install.packages("ccRemover") on R command to install it. Usage: The manual is available on CRAN.

LEAP: Lag-based Expression Association for Pseudotime-series

        Description: LEAP constructs gene co-expression networks for single-cell RNA-sequencing data using pseudotime ordering.
        Availability: The R version is available on CRAN. You can type install.packages("LEAP") on R command to install it. Usage: The manual is available on CRAN.

SAM: Significance Analysis of Microarrays

        Description: SAM is a statistical technique for finding significant genes in a set of microarray experiments or sequencing experiments. I am an author of the current major new release 4.0, July 1, 2011. In this new version, SAM is able to handle sequencing data, using the "SAMSeq" method described in my paper on Statistical Methods in Medical Research. The Excel interface has also been updated. Note that it does not accommodate with Excel 2010 as Microsoft changed the R-Excel API in Excel 2010. We are re-scripting SAM but for now please use earlier versions of Excel. Also, for very large datasets, we recommend using the R version (R package samr) for the sake of the computing time.
        Availability: The R version is available on CRAN. You can type install.packages("samr") on R command to install it. The Excel interface is available on Robert Tibshirani's webpage.
        Usage: The manual of the R-version is available on CRAN. The manual for the Excel interface, which is more detailed, is available on Robert Tibshirani's webpage.

PoissonSeq

        Description: PoissonSeq is an R package for the significance analysis of sequencing data based on a Poisson log linear model. It implements all methods described in my paper on Biostatistics. Briefly speaking, we estimate the sequencing depths of the experiments using a new method based on Poisson goodness-of-fit statistic, calculate a score statistic on the basis of a Poisson log-linear model, and then estimate the false discovery rate using a modified version of permutation plug-in method.
        Availability: Available on CRAN. You can type install.packages("PoissonSeq") on R command to install it.
        Usage: The manual of the R-version is available on CRAN. Here is a more detailed instruction with data (.zip file, unzip it before using it).

mseq

        Description: mseq is an R package for modeling non-uniformity in short-read rates in RNA-Seq data. This package implements all the methods in my paper on Genome Biology, including both the iterative glm procedure for the Poisson linear model and the training procedure of the MART model, as well as the cross-validation for both methods.
        Availability: Available on CRAN. You can type install.packages("mseq") on R command to install it.
        Usage: The manual of the R-version is available on CRAN.
        Data: data_top100.zip (unzip it before use).

npSeq

        Description: npSeq is an R package for the significance analysis of sequencing data.  It implements exactly the method described in my paper on Statistical Methods in Medical Research. The statistic used by npSeq is exactly the same as that in SAM 4.0. The only difference is that npSeq uses symmetric cutoffs, while SAM uses asymmetric cutoffs. Therefore, for some datasets, all significant genes obtained by SAM are either all up-regulated or all down-regulated, but npSeq almost always gives significant genes that include both up-regulated genes and down-regulated genes.
        Availability: This package has NOT been submitted to CRAN. You can only download it from here.
        Usage: The manual is here. Here is a more detailed instruction with data (.zip file, unzip it before using it), which details how to install and use the package.

HotspotFisher

        Description: HotspotFisher is a package for detecting recombination hotspots from population polymorphism data. It implements the method described in my paper on AJHG. Written in standard C++, it can be complied and executed in various operating systems, such as Linux/Unix and Windows. HotspotFisher uses a multi-hotspot model and the truncated weighted pairwise log-likelihood (TWPLL), so it can detect multiple hotspots in a region. HotspotFisher can be used to both phased/haplotype and unphased/genotype data directly, with arbitrary levels of missing data.
        Source (for Linux): HotspotFisher_Linux_source. Unzip it and follow instructions in how_to_complie.txt to compile it.
        Executable: Linux version (for 64-bit computers), Linux version (for 32-bit computers), Windows version (for 32-bit computers).
        Usage: pdf document.
        Data: Sample Input, Simulation Data.