Current Research Topics:
Analysis of next generation sequencing data, especially RNA-Seq and single-cell RNA-Seq
Data mining, classification, clustering
Current Research Support:
NIH R01 (PI). Total: $1.11 million. PIs: Li, Clark, Emrich, Milenkovic.
A news story about this grant: College of Science Department of ACMS
NIH R33 (co-investigator). Total: $1.02 million. PI: Schultz.
NIH R01 (co-investigator). Total: $1.39 million. PI: Zhang.
NIH R01 (co-investigator). Total: $1.76 million. PI: Qiu.
NIH R56 (co-investigator). Total: $0.562 million. PI: Severson.
Notre Dame Faculty Research Support Initiation Grant (PI). Total: $10K. PI: Li.
Indiana CTSI Design and Biostatistics Program Pilot Grant (PI). Total: $11K. PI: Li.
Alicia Specht, PhD, 2012 - present
Martin Barron, PhD, 2013 - present
Cheng Liu, PhD, 2016 - present
Hongyu Zhao, PhD, 2016 - present
Chuanqi Wang, PhD, 2016 - present
Dai Cheng, PhD, 2013 - present (as second mentor. co-advised with Dr. Littlepage)
Ge Jiang, Master's
Tingting Zhang, Master's, graduated in 2013
Can Shao, Master's, graduated in 2014
Martin Barron* and Jun Li** (2016)
Identifying and removing the cell-cycle effect from single-cell RNA-Sequencing data,
Scientific Reports, accepted. (* Barron is a PhD student of Li; ** Li is the corresponding author.)
We are very excited about this work! We have developed a new method called "ccRemover" that removes the cell-cycle effect from single-cell RNA-Seq data. The big advantage of ccRemover is its high reliablity: while often completely removes the cell-cycle effect, it always preserves other biological signals of interest in the data. We expect it to serve as an important pre-processing step for most scRNA-Seq data analyses! Expect the R package to be available on Bioconductor soon!
Rispah T. Sawe, Maggie Kerper, Sunil Badve, Jun Li, Mayra Sandoval-Cooper, Jingmeng Xie, Zonggao Shi, Kirtika Patel, David Chumba, Ayub Ofulla, Jenifer Prosperi, Katherine Taylor, M. Sharon Stack, Simeon Mining and Laurie E. Littlepage (2016) Aggressive breast cancer in western Kenya has early onset, high proliferation, and immune cell infiltration, BMC Cancer, 16:204.
Alicia Specht* and Jun Li** (2015) Estimation of gene co-expression from RNA-Seq count data, Statistics and Its Interface, 8(4):507-515. (* Specht is a PhD student of Li; ** Li is the corresponding author.)
Can Shao*, Jun Li**, and Ying Cheng (2015) Detection of Test Speededness Using Change-Point Analysis, Psychometrika, Accepted. (* Shao is master's student of Li; Li is a corresponding author.)
Lynn Roy, Serene J Samyesudhas, Martin Carrasco, Jun Li, Stancy Joseph, Richard Dahl, and Karen D Cowden Dahl (2014) ARID3B increases ovarian tumor burden and is associated with a cancer stem cell gene signature, Oncotarget, 5 (18): 8355-8366.
Miranda Burnette, Teresa Brito-Robinson, Jun Li*, and Jeremiah Zartman (2014) An inverse small molecule screen to design a chemically defined medium supporting long-term growth of Drosophila cell lines, Molecular BioSystems, 10(10): 2713-2723. (*: Li is a corresponding author.)
Alayne Brunner*, Jun Li*, Xiangqian Guo, Sushama Varma, Shirley Zhu, Rui Li, Robert Tibshirani, and Robert B West (2014) A shared transcriptional program in early breast neoplasias despite genetic and clinical distinctions, Genome Biology, 15(5): R71. (* joint first authors)
Chunlei Li, Jun Li, Holly V Goodson, and Mark S Alber (2014) Microtubule dynamic instability: the role of cracks between protofilaments, Soft Matter, 10(20): 2069-2080.
Jun Li* and Robert Tibshirani (2013) Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Statistical Methods in Medical Research, 22(5): 519-36. (*: Li is the corresponding author.)
Associated software: SAM (samr) and npseq.
Wan Y, Qu K, Ouyang Z, Kertesz M, Li J, Tibshirani R, Makino DL, Nutter RC, Segal E, and Chang HY (2012) Genome-wide Measurement of RNA Folding Energies, Molecular Cell, 48(2):169-81.
Jun Li*, Daniela M Witten, Iain Johnstone, and Robert Tibshirani (2012) Normalization, testing, and false discovery rate estimation for rna-sequencing data. Biostatistics, 13(3):523-38. (*: Li is the corresponding author.)
Associated software: PoissonSeq.
Lewis Z Hong, Jun Li, Anne Schmidt-Kuntzel, Wesley C. Warren, and Gregory S. Barsh (2011) Digital gene expression for non-model organisms. Genome Research, 21(11): 1905–1915.
Jun Li, Hui Jiang, and Wing H Wong (2010) Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biology 11(5): R50.
Associated software: mseq.
Jun Li, Michael Q Zhang, Xuegong Zhang (2006) A New Method for Detecting Human Recombination Hotspots and Its Applications to the HapMap ENCODE Data. The American Journal of Human Genetics 79:628-639.
Associated software: HotspotFisher.
Jing Zhang, Fei Li, Jun Li, Michael Q Zhang, Xuegong Zhang (2004) Evidence and characteristics of putative human alpha recombination hotspots. Human Molecular Genetics 13:2823-2828.
Submitted (This list does not include manuscripts in preparation):
Jun Li* and Hui Jiang.
Robust estimation of isoform expression with RNA-Seq data. (*: Li is the corresponding author.)
Alicia Specht* and Jun Li **. LEAP: Constructing gene co-expression networks for single-cell RNA-sequencing data using pseudotime ordering. (* Specht is PhD student of Li; ** Li is the corresponding author.)
Fazle E Faisal, Julie L Chaney, Jun Li, Patricia L Clark, Tijana Milenkovic. Network approach integrates 3D structural and sequence data to improve protein classification.
Julie L. Chaney, Aaron Steele, Rory Carmichael, Alicia T. Specht *, Jun Li, Scott Emrich, Patricia L. Clark. Widespread Position-specific Conservation of Synonymous Rare Codons within Coding Sequences. (*: Specht is a PhD student of Li.)
Chen Dai*, Jennifer Arceo, James Arnold, Junmin Wu, Arun Sreekumar, Norman J Dovichi, Jun Li, Laurie E Littlepage. Novel metabolic networks identify metabolites with distinct spatial distribution and prognostic value in breast cancer. (*: Dai is a PhD student of Li.)
Qian Yang, Yue Hu, Jun Li, Xuegong Zhang. ulfasQTL: an Ultra-Fast Method of Splicing QTL Analysis.
Andrew Baker, Debra Wyatt, Ianina Bognanni, Kinnari Pandya, Maurizio Bocchetta, Jun Li, Clodia Osipo. The Role of Notch-1, PTEN, and Akt Signaling in Trastuzumab Resistant HER2+ Breast Cancer Stem Cells.
Jennifer L. Starner-Kreinbrink, Jonathan P. Renn, Julie L. Chaney, Alicia T. Specht, Jorge A. Giron, Jun Li, Patricia L. Clark. One-pot assembly of diverse proteins into macroscopic, rope-like fibers under physiological conditions.
Description: ccRemover detects and removes the cell-cycle effect from single-cell RNA-Seq data.
The current method (scLVM) for removing the cell-cycle effect is unable to effectively identify
this effect and has a high risk of removing other biological components of interest,
compromising downstream analysis. ccRemover is a new method that reliably
identifies the cell-cycle effect and removes it. ccRemover preserves other biological
signals of interest in the data and thus can serve as
an important pre-processing step for many scRNA-Seq data analyses.
Availability: We are preparing the manual. Will be soon available on Bioconductor!
Description: LEAP constructs gene co-expression networks for single-cell RNA-sequencing data using pseudotime ordering.
Availability: The R version is available on CRAN. You can type install.packages("LEAP") on R command to install it. Usage: The manual is available on CRAN. .
Description: SAM is a statistical technique for finding
significant genes in a set of microarray experiments or
sequencing experiments. I am an author of the current major new release 4.0, July 1, 2011.
In this new version, SAM is able to handle sequencing data, using the "SAMSeq" method described
in my paper on Statistical Methods in Medical Research.
The Excel interface has also been updated. Note that it does not
accommodate with Excel 2010 as Microsoft changed the R-Excel API in
Excel 2010. We are re-scripting SAM but for now please use earlier
versions of Excel. Also, for very large datasets, we recommend using
the R version (R package samr) for the sake of the computing time.
Availability: The R version is available on CRAN. You can type install.packages("samr") on R command to install it. The Excel interface is available on Robert Tibshirani's webpage.
Usage: The manual of the R-version is available on CRAN. The manual for the Excel interface, which is more detailed, is available on Robert Tibshirani's webpage.
PoissonSeq is an R package for the significance analysis of sequencing data based on a Poisson log linear model.
It implements all methods described in my paper on
Briefly speaking, we estimate the sequencing depths of the
experiments using a new method based on Poisson goodness-of-fit
statistic, calculate a score statistic on the basis of a Poisson
log-linear model, and then estimate the false discovery rate using a
modified version of permutation plug-in method.
Availability: Available on CRAN. You can type install.packages("PoissonSeq") on R command to install it.
Usage: The manual of the R-version is available on CRAN. Here is a more detailed instruction with data (.zip file, unzip it before using it).
mseq is an R package for modeling non-uniformity in short-read rates
in RNA-Seq data. This package implements all the methods in my paper
on Genome Biology, including both the iterative glm procedure
for the Poisson linear model and the training procedure of the MART
model, as well as the cross-validation for both methods.
Availability: Available on CRAN. You can type install.packages("mseq") on R command to install it.
Usage: The manual of the R-version is available on CRAN.
Data: data_top100.zip (unzip it before use).
npSeq is an R package for the significance analysis of sequencing data.
It implements exactly the method described in my paper on
Statistical Methods in Medical Research.
The statistic used by npSeq is exactly the same as that in SAM 4.0. The only
difference is that npSeq uses symmetric cutoffs, while SAM uses
asymmetric cutoffs. Therefore, for some datasets, all significant genes
obtained by SAM are either all up-regulated or all down-regulated, but npSeq
almost always gives significant genes that include both up-regulated
genes and down-regulated genes.
Availability: This package has NOT been submitted to CRAN. You can only download it from here.
Usage: The manual is here. Here is a more detailed instruction with data (.zip file, unzip it before using it), which details how to install and use the package.
HotspotFisher is a package for detecting recombination hotspots from population
polymorphism data. It implements the method described in my paper on
AJHG. Written in standard C++, it can be complied and executed in
various operating systems, such as Linux/Unix and Windows. HotspotFisher uses a
multi-hotspot model and the truncated weighted pairwise log-likelihood (TWPLL), so
it can detect multiple hotspots in a region. HotspotFisher can be used to both phased/haplotype and unphased/genotype
data directly, with arbitrary levels of missing data.
Source (for Linux): HotspotFisher_Linux_source. Unzip it and follow instructions in how_to_complie.txt to compile it.
Executable: Linux version (for 64-bit computers), Linux version (for 32-bit computers), Windows version (for 32-bit computers).
Usage: pdf document.
Data: Sample Input, Simulation Data.