Siyuan Jiang

(Pronounce: Suh-you-en Chiang)

Doing Research in Software Engineering

I am a PhD candidate at the University of Notre Dame, in Dr. Collin McMillan's lab.
I will graduate in May 2018 (CV: pdf).

My main research interests are in software engineering (SE) areas, such as program comprehension and program analysis. My recent research projects focus on adapting deep learning technologies to address SE problems.

I am also passionate about teaching. Please check out my teaching paragraph here.

Summary of My Research Projects

My most recent project is to generate commit messages by using Neural Machine Translation (NMT) (Publications 16 and 14). In this project, I introduced the state-of-the-art NMT in generating natural language descriptions of the modifications that programmers made to software artifacts. The input of my approach is a diff file, which is generated by a file differencing tool (e.g. git diff). The output of my approach is an English sentence that describe the changes listed in the diff file. Click to see more about this project! I presented this work in ASE'17 in Urbana-Champaign, Illinois and here is the transcript with the slides!

I am also passionate about empirical software engineering. While most of my projects deal with large amounts of data, two of my projects focus mostly on obtaining and analyzing data. One study is about the performance of program slicing, which is a fundamental program analysis technique (Publications 7 and 5). The other study is about the behavior of programmers in debugging (Publication 11). I presented this work in ICSME'17 in Shanghai.

I collaborated with my labmates a lot. Two projects that I have extensively worked on are applying machine learning algorithms, specifically, classification algorithms such as SVM (support vector machine), in automatic software documentation. One project is using artificial neural network to find the code fragments that are important to be documented (Publication 15). In this project, I went to ABB and surveyed about how professional programmers rate the importance of classes. The other project is using SVM to generate extractive summaries of developer-customer meetings (Publication 12). I extracted most features from the meeting transcripts, ran machine learning algorithms with my colleague Ameer, and conducted a reproducibility study on AMI data set.

I also built a tool, Docio, (also collaborated with Ameer) which is a documentation tool that puts actual input/output values into api documents (Publication 13). Click to see more about this project!



17 Rrezarta Krasniqi, Siyuan Jiang, and Collin McMillan, "TraceLab Components for Generating Extractive Summaries of User Stories," in Proc. of the 33nd IEEE International Conference on Software Maintenance and Evolution, Artifacts Track (ICSME'17), Shanghai, China, Sept. 17-24, 2017.
preprint, online appendix

16 Siyuan Jiang, Ameer Armaly, and Collin McMillan, "Automatically Generating Commit Messages from Diffs Using Neural Machine Translation," in Proc. of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE'17).
arXiv:1708.09492, acm dl, online appendix

15 Paul W. McBurney, Siyuan Jiang, Marouane Kessentini, Nicholas A. Kraft, Ameer Armaly, Mohamed Wiem Mkaouer, and Collin McMillan, "Towards Prioritizing Documentation Effort," in Transactions on Software Engineering (TSE), accepted May 15, 2017. Journal-First Presentation at ESEC/FSE'17.
IEEExplore, preprint

14 Siyuan Jiang and Collin McMillan, "Towards Automatic Generation of Short Summaries of Commits," in Proc. of the 25th International Conference on Program Comprehension, ERA Track (ICPC'17).
arXiv:1703.09603, acm dl, online appendix

13 Siyuan Jiang, Ameer Armaly, Collin McMillan, Qiyu Zhi, and Ronald Metoyer, "Docio: Documenting API Input/Output Examples," in Proc. of the 25th International Conference on Program Comprehension, Tool Demo Track (ICPC'17).
arXiv:1703.09613, ieeexplore, online appendix

12 Paige Rodeghero, Siyuan Jiang, Ameer Armaly, and Collin McMillan, "Detecting User Story Information in Developer-Client Conversations to Generate Extractive Summaries," in Proc. of the 39th International Conference on Software Engineering (ICSE'17).
preprint, ieeexplore, online appendix


11 Siyuan Jiang, Collin McMillan, and Raul Santelices, "Do Programmers Do Change Impact Analysis?" in Empirical Software Engineering (EMSE), vol. 22, no.2, April 2017, pp.631-669. Journal-First Presentation at ICSME'17.
EMSE link, Springer online version, online appendix

10 Haipeng Cai, Raul Santelices, and Siyuan Jiang, "Prioritizing Change-Impact Analysis via Semantic Program-Dependence Quantification", in IEEE Transactions on Reliability, vol. 65, no. 3, Sept. 2016, pp. 1114 -1132.


9 Raul Santelices, Haipeng Cai, Siyuan Jiang, and Yiji Zhang, “Advanced Dependence Analysis for Software Testing, Debugging, and Evolution”, in IEEE Reliability Digest, pages 18-24, 2014.

8 Ting Su, Geguang Pu, Bin Fang, Jifeng He, Jun Yan, Siyuan Jiang, and Jianjun Zhao, "Automated Coverage-Driven Test Data Generation Using Dynamic Symbolic Execution," in Proc. of the 8th IEEE International Conference on Software Security and Reliability (SERE'14).

7 Siyuan Jiang, Raul Santelices, Mark Grechanik, and Haipeng Cai, "On the Accuracy of Forward Dynamic Slicing and its Effects on Software Maintenance," in Proc. of the 14th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM'14).

6 Haipeng Cai, Siyuan Jiang, Raul Santelices, Ying-Jie Zhang, and Yiji Zhang, "SENSA: Sensitivity Analysis for Quantitative Change-Impact Prediction," in Proc. of the 14th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM'14).

5 Siyuan Jiang, Raul Santelices, Haipeng Cai, and Mark Grechanik, "How Accurate Is Dynamic Program Slicing? An Empirical Approach to Compute Accuracy Bounds," in Proc. of the 8th IEEE International Conference on Software Security and Reliability (SERE '14), fast abstract

4 Raul Santelices, Yiji Zhang, Haipeng Cai, and Siyuan Jiang, "Change-Effects Analysis for Evolving Software," in Advances in Computers, Vol. 93, Chapter 5, 2014.


3 Raul Santelices, Yiji Zhang, Siyuan Jiang, Haipeng Cai and Ying-Jie Zhang, "Quantitative Program Slicing: Separating Statements by Relevance," in Proc. of the 2013 International Conference on Software Engineering, New Ideas and Emerging Results Track (ICSE'13)

2 Raul Santelices, Yiji Zhang, Haipeng Cai, and Siyuan Jiang, "DUA-Forensics: A Fine-Grained Dependence Analysis and Instrumentation Framework Based on Soot," in Proc. of the ACM SIGPLAN International Workshop on the State Of the Art in Java Program Analysis (SOAP'13).


1 Xiao Yu, Shuai Sun, Geguang Pu, Siyuan Jiang, and Zheng Wang, "A Parallel Approach to Concolic Testing with Low-Cost Synchronization," in Electronic Notes in Theoretical Computer Science, vol. 274, Aug. 2011.


My teaching philosophy is to value different types of learners by paying attention to various challenges that students from different backgrounds may face. I have worked with various students closely when I assisted students in labs every week for two undergraduate courses, which had more than 100 students. I also have mentored students at all levels in different settings. I mentored a first-year Ph.D. student in my research project, an undergraduate student in his research course, and three M.S. students at the University of Illinois at Chicago remotely. Through these experiences, I came to know the needs of the students from different backgrounds and developed communication skills that helped me teach and mentor students efficiently.

As a female computer scientist, I understand the challenges that female STEM students may face, and I am dedicated to supporting women in computer science. I mentored a female undergraduate student and a female junior Ph.D. student, where we formed close relationships and had conversations about important issues, such as communications with advisors.

Favorite Seminar

Besides software engineering, I am particularly interested in data science. This is my favorite seminar, "The Future of Data Analysis", presented by Dr. Edward Tufte.