I am a professor in Department of Applied and Computational Mathematics and Statistics at University of Notre Dame.

I got my bachelor's degree in Automation and master's degree in Pattern Recognition and Intelligent Systems, both from Department of Automation at Tsinghua University. In 2012, I got my PhD in Statistics from Stanford University, under the supervision of Professor Robert Tibshirani. I jointed Notre Dame as a tenure-track Assistant Professor after graduation, was promoted to Associate Professor with tenure in 2017, and to Full Professor in 2020.

I am a statistician, and my main research interest is in developing statistical and computational methods and algorithms on big datasets that challenge tradtional statistical methods and computational power. Especially, my research in recent years focuses on:

1. RNA-seq, and especially, single-cell RNA-seq data analysis. RNA-seq is an ultra-high-throughput technique that generated huge, high-dimensional, and noisy data. The traditional RNA-seq data are a great platform to test your abilities on "classical" statistical models such as generalized linear models and robust statistics. Named "Breakthrough of 2018" by Science, the more recent single-cell RNA-seq data, on the other hand, are an ideal form to test your ability in data mining, such as clustering, classification, and variable selection, because of their much larger sample sizes. This type of data is also much noisier, which makes working on them even more challenging and exciting!

2. Deep-learning neural network. There is no better time to work in this field. Deep-learning has beaten classical machine learning methods in some applications but still struggle on many other data, especially data whose input features are of no spatial and temporal order so that convolution does not make sense. Another big problem of neural network is its poor interpretability. I'm exploring on building high-performance and interpretable neural network models on these data.

3. Network data analysis. Feature extraction in network is an interesting problem. One can use techniques such as graphlets to summarize a network into a vector of features. Or, more elaboratively, one can use graphlets-orbits to summarize a network into a matrix of features. I'm working on proposing efficient ways of feature extraction and efficient ways of using these features to do classification.

4. Collaborations with people from other departments. I am interested in using my techniques in statistical modeling and data mining on all kinds of real data, and I have developed collaborations with researchers from biology, biochemistry, and other departments, in Notre Dame and other institutes. Please do not hesitate to contact me if you have difficult data; I can be the right person to solve your headaches.

5. Fishing. :-) Well, this is not research, but to me it is serious and have brought me as much fun as research. When I was a five-year-old boy, I built up my first rod, line, and hooks by myself, and got my first fish--a Crucian carp. After that, I never stopped fishing. In California, I went both freshwater and saltwater fishing. Most remarkably, I got an 7 lb 0 oz rainbow trout using 4 lb line (on a Thomas Buoyant spoon), and a 16 lb striper in the San Francisco Bay on the shore (on a piece of squid). My fishing journey in Indiana has started, and here are some of my catches in the St Joseph River: 5 lb 1 oz largemouth bass (on a Texas rig), 13 lb steelhead (on a jerkbait), 11 lb chinook salmon (on a jerkbait), and a 7 lb steelhead with 4 lb line and a 2 inch minnow (when I was targeting crappie)!