Education
- Ph.D. in Computer and Information Science, University of Pennsylvania, 2004. Advisor: Aravind K. Joshi. Dissertation: Evaluation of Grammar Formalisms for Applications to Natural Language Processing and Biological Sequence Analysis. Published as: Grammars for Language and Genes: Theoretical and Empirical Investigations, Springer, 2012.
- S.M. in Computer Science, Harvard University, 1997.
- A.B. cum laude in Computer Science, Harvard University, 1997.
Honors and awards
- Social impact award, with F. Faisal et al., ACL 2024
- Outstanding paper award and senior area chair award, with C. Taguchi, ACL 2024
- Outstanding faculty teaching award, Notre Dame Dept. of Computer Science and Engineering, 2017.
- Best paper award, with W. Wang and K. Knight, NAACL HLT 2009
- Best paper award, ACL 2005
- Morris and Dorothy Rubinoff Award, 2005, University of Pennsylvania, for a dissertation that represents an advance in innovative application of computer technology
- Phi Beta Kappa, fall 1996, and Detur Book Prize, 1994, Harvard University, awarded to top 5% of class
Professional experience
- 2014–present: Associate professor, University of Notre Dame, Department of Computer Science and Engineering.
- 2018–2021: Adjoint professor, Toyota Technological Institute at Chicago.
- 2014–2017: Adjunct associate professor, USC Department of Computer Science.
- 2013–2014: Project leader, USC Information Sciences Institute.
- 2007–2014: Research assistant professor, USC
Department of Computer Science.
- 2006–2012: Computer scientist, USC Information Sciences Institute.
- 2004–2005: Postdoctoral researcher, Univ. of Maryland Institute for Advanced Computer Studies.
Advising and Thesis Commitees
- Former postdocs: Victoria Fossum (2013; now at Google)
- Former PhD students: Ashish Vaswani (2014), Tomer Levinboim (2017, now at Google), Antonis Anastasopoulos (2019, now assistant professor at George Mason), Arturo Argueta (2019, now at Apple), Kenton Murray (2020, now at JHU HLT CoE), Justin DeBenedetto (2021, now assistant professor at Villanova), Toan Nguyen (2021, now at Amazon), Brian DuSell (2023, now at ETH Zürich)
- PhD students: Darcey Riley, Stephen Bothwell, Kenneth Sible, Aarohi Srivastava, Chihiro Taguchi, Andy Yang
- Former masters students: Xing Jie Zhong (now at Google)
- Summer interns:
Michael Bloodgood (Delaware), Wei Ho (Princeton), Paramveer Dhillon (Penn), John DeNero (Berkeley), Amittai Axelrod (Washington), Adam Pauls (Berkeley), Yoav Goldberg (Ben Gurion), Anne Irvine (JHU), Xuchen Yao (JHU), Ada Wan (CUNY), Jackie Lee (MIT), Arvind Neelakantan (Columbia/UMass), Xiang Zhou (Shanghai Jiaotong)
- Undergraduate research: Cindy Xinyi Wang (PhD CMU, now at Google), Chan Hee Song (now at Ohio State), Alison Lui, Greta Rauch, Colin McDonald, and others
- PhD thesis committees
- external: Adam Lopez (Univ. of Maryland), 2008; Hendra Setiawan (Natl. Univ. of Singapore), 2008; John DeNero (Berkeley), 2010; Kevin Gimpel (CMU), 2012; Baskaran Sankaran (Simon Fraser), 2013; Andrea Gesmundo (Geneva), 2013; Qing Dou (USC), 2015; Liangyou Li (Dublin City University), 2017; Sorcha Gilroy (Edinburgh), 2019; Tim Vieira (JHU)
- internal: Jonathan May, 2010; Sujith Ravi, 2011; Stephen Tratz, 2011; Steve DeNeefe, 2011; Dirk Hovy, 2013; Will McBurney, 2016; Jin Guo, 2017; Yuxiao Dong, 2017; Sal Aguiñaga, 2017; Siyuan Jiang, 2018; Baoxu Shi, 2018; Douglas Duhaime (English, 2019); Chuxu Zhang, 2019; Satyaki Sikdar, 2021; Alex LeClair, 2022; Wenhao Yu, 2023; Daniel Gonzalez,
- Masters thesis committees: Matthew Brooks (ESTEEM, 2016); Anselme Mucunguzi (ESTEEM, 2018); Xueying Wang, 2019; Andrew Wood, 2019
Courses Taught
- Fall 2024: Theory of Neural Networks
- Spring 2016–2018, Spring 2020, Fall 2020, Spring 2023, Spring 2024: Theory of Computing
- Spring 2019, Fall 2022: Programming Languages
- Spring 2015, Fall 2016–2019, Spring 2021, Fall 2021, Fall 2023: Natural Language Processing
- Fall 2015: Data Structures
- Fall 2014: Data Structures, with P. Brenner
- Fall 2013: Advanced Natural Language Processing, with K. Knight
- Fall 2007–2009, 2011–2012: Instructor, Empirical Methods in Natural Language
Processing, with K. Knight and L. Huang
- Spring 2011: Statistical Machine Translation, with K. Knight and L. Huang
- Spring 2007 and 2008: Natural Language Processing, with E. Hovy et al.
- Fall 2005: Instructor, Computational Linguistics I, with Philip Resnik
Publications
Journal articles
Lena Strobl, William Merrill, Gail Weiss, David Chiang, and Dana Angluin.
What formal languages can transformers express?
A survey.
Transactions of the Association for Computational Linguistics, 12:543–561, 2024.
doi:10.1162/tacl_a_00663.
Patrick Soga and David Chiang.
Bridging graph position encodings for transformers with weighted graph-walking automata.
Transactions on Machine Learning Research, 2023.
David Chiang, Colin McDonald, and Chung-chieh Shan.
Exact recursive probabilistic programming.
PACMPL, 2023.
doi:10.1145/3586050.
David Chiang, Alexander M. Rush, and Boaz Barak.
Named tensor notation.
Transactions on Machine Learning Research, January 2023.
Samuel Grieggs, Bingyu Shen, Greta Rauch, Pei Li, Jiaqi Ma, David Chiang, Brian Price, and Walter Scheirer.
Measuring human perception to improve handwritten document transcription.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
doi:10.1109/TPAMI.2021.3092688.
Salvador Aguinaga, David Chiang, and Tim Weninger.
Learning hyperedge replacement grammars for graph generation.
IEEE Trans. Pattern Analysis and Machine Intelligence, 41(3):625–638, 2019.
doi:10.1109/TPAMI.2018.2810877.
David Chiang, Frank Drewes, Daniel Gildea, Adam Lopez, and Giorgio Satta.
Weighted DAG automata for semantic graphs.
Computational Linguistics, 44(1):119–186, 2018.
Ulf Hermjakob, Qiang Li, Daniel Marcu, Jonathan May, Sebastian J. Mielke, Nima Pourdamghani, Michael Pust, Xing Shi, Kevin Knight, Tomer Levinboim, Kenton Murray, David Chiang, Boliang Zhang, Xiaoman Pan, Di Lu, Ying Lin, and Heng Ji.
Incident-driven machine translation and name tagging for low-resource languages.
Machine Translation, 32(1–2):59–89, 2018.
doi:10.1007/s10590-017-9207-1.
Steven Bird, David Chiang, Friedel Frowein, Florian Hanke, and Ashish Vaswani.
Documentary linguistics and computational linguistics: a response to Brooks.
Language Documentation and Conservation, 9:10–11, 2015.
Yuval Marton, David Chiang, and Philip Resnik.
Soft syntactic constraints for
Arabic-English hierarchical phrase-based translation.
Machine Translation, 26(1–2):137–157, 2012.
doi:10.1007/s10590-011-9111-z.
Steven Bird, David Chiang, Friedel Frowein, Andrea L. Berez, Mark Eby, Florian Hanke, Ryan Shelby, Ashish Vaswani, and Ada Wan.
The International Workshop on Language Preservation: an experiment in text collection and language technology.
Language Documentation and Conservation, pages 155–167, 2013.
David Chiang.
Hope and fear for discriminative training of statistical translation models.
J. Machine Learning Research, 13:1159–1187, 2012.
A few typos corrected, in particular in the definition of the loss function.
Ken A. Dill, Adam Lucas, Julia Hockenmaier, Liang Huang, David Chiang, and Aravind K. Joshi.
Computational linguistics: a new tool for exploring biopolymer structures and statistical mechanics.
Polymer, 48(15):4289–4300, 2007.
doi:10.1016/j.polymer.2007.05.018.
David Chiang, Aravind K. Joshi, and Ken A. Dill.
A grammatical theory for the conformational changes of simple helix bundles.
J. Computational Biology, 13(1):21–42, 2006.
doi:10.1089/cmb.2006.13.21.
David Chiang, Aravind K. Joshi, and David B. Searls.
Grammatical representations of macromolecular structure.
J. Computational Biology, 13(5):1077–1100, 2006.
doi:10.1089/cmb.2006.13.1077.
Mark Dras, David Chiang, and William Schuler.
On relations of constituency and dependency grammars.
Research on Language and Computation, 2:281–305, 2004.
Books and book chapters
David Chiang.
Grammars for Language and Genes: Theoretical and Empirical Investigations.
Theory and Applications of Natural Language Processing.
Springer, 2012.
David Chiang.
Statistical parsing with an automatically extracted tree adjoining grammar.
In Rens Bod, Remko Scha, and Khalil Sima'an, editors, Data Oriented Parsing, pages 299–316.
CSLI Publications, Stanford, 2003.
Refereed conference papers
Andy Yang, David Chiang, and Dana Angluin.
Masked hard-attention transformers recognize exactly the star-free languages.
In Proc. NeurIPS. 2024.
To appear.
Ken Sible and David Chiang.
Improving rare word translation with dictionaries and attention masking.
In Proc. AMTA. 2024.
To appear.
Andy Yang and David Chiang.
Counting like transformers: compiling temporal counting logic into softmax transformers.
In Proc. CoLM. 2024.
To appear.
Chihiro Taguchi and David Chiang.
Language complexity and speech recognition accuracy: orthographic complexity hurts, phonological complexity doesn't.
In Proc. ACL. 2024.
Outstanding Paper Award and Senior Area Chair Award.
Fahim Faisal, Orevaoghene Ahia, Aarohi Srivastava, Kabir Ahuja, David Chiang, Yulia Tsvetkov, and Antonios Anastasopoulos.
DIALECTBENCH: a NLP benchmark for dialects, varieties, and closely-related languages.
In Proc. ACL. 2024.
Social Impact Award.
Stephen Bothwell, Brian DuSell, David Chiang, and Brian Krostenko.
PILA: a historical-linguistic dataset of Proto-Italic and Latin.
In Proc. LREC-COLING, 12749–12760. 2024.
Chihiro Taguchi, Jefferson Saransig, Dayana Velásquez, and David Chiang.
KILLKAN: the automatic speech recognition dataset for Kichwa with morphosyntactic information.
In Proc. LREC-COLING, 9753–9763. 2024.
Brian DuSell and David Chiang.
Stack attention: improving the ability of transformers to model hierarchical patterns.
In Proc. ICLR. 2024.
Stephen Bothwell, Justin DeBenedetto, Theresa Crnkovich, Hildegund M
üller, and David Chiang.
Introducing rhetorical parallelism detection: a new task with datasets, metrics, and baselines.
In
Proc. EMNLP, 5007–5039. 2023.
doi:10.18653/v1/2023.emnlp-main.305.
Alexandra Butoi, Tim Vieira, Ryan Cotterell, and David Chiang.
Efficient algorithms for recognizing weighted tree-adjoining languages.
In Proc. EMNLP. 2023.
Aarohi Srivastava and David Chiang.
BERTwich: extending BERT's capabilities to model dialectal and noisy text.
In Findings of ACL: EMNLP. 2023.
Chihiro Taguchi, Yusuke Sakai, Parisa Haghani, and David Chiang.
Universal automatic phonetic transcription into the
International
Phonetic
Alphabet.
In
Proc. INTERSPEECH. 2023.
doi:10.21437/Interspeech.2023-2584.
Alexandra Butoi, Ryan Cotterell, and David Chiang.
Convergence and diversity in the control hierarchy.
In Proc. ACL. 2023.
David Chiang, Peter Cholak, and Anand Pillay.
Tighter bounds on the expressivity of transformer encoders.
In Proc. ICML, 5544–5562. 2023.
Brian DuSell and David Chiang.
The surprising computational power of nondeterministic stack RNNs.
In Proc. ICLR. 2023.
Alexandra Butoi, Brian DuSell, Tim Vieira, Ryan Cotterell, and David Chiang.
Algorithms for weighted pushdown automata.
In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors,
Proc. EMNLP, 9669–9680. 2022.
doi:10.18653/v1/2022.emnlp-main.656.
David Chiang and Peter Cholak.
Overcoming a theoretical limitation of self-attention.
In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors,
Proc. ACL, volume 1, 7654–7664. 2022.
doi:10.18653/v1/2022.acl-long.527.
Brian DuSell and David Chiang.
Learning hierarchical structures with differentiable nondeterministic stacks.
In Proc. ICLR. 2022.
Toan Q. Nguyen, Kenton Murray, and David Chiang.
Data augmentation by concatenation for low-resource translation: a mystery and a solution.
In Proc. Conference on Spoken Language Translation. 2021.
David Chiang and Darcey Riley.
Factor graph grammars.
In Proc. NeurIPS, 6648–6658. 2020.
Brian DuSell and David Chiang.
Learning context-free languages with nondeterministic stack RNNs.
In Proc. CoNLL, 507–519. 2020.
Justin DeBenedetto and David Chiang.
Representing unordered data using complex-weighted multiset automata.
In Proc. ICML, 2412–2420. 2020.
Arturo Argueta and David Chiang.
Accelerating sparse matrix operations in neural networks on graphics processing units.
In Proc. ACL, 6215–6224. 2019.
Antonios Anastasopoulos, Alison Lui, Toan Q. Nguyen, and David Chiang.
Neural machine translation of text from non-native speakers.
In Proc. NAACL: HLT, volume 1, 3070–3080. 2019.
Kenton Murray and David Chiang.
Correcting length bias in neural machine translation.
In Proc. WMT, 212–223. 2018.
Antonios Anastasopoulos, Marika Lekakou, Josep Quer, Eleni Zimianiti, Justin DeBenedetto, and David Chiang.
Part-of-speech tagging on an endangered language: a parallel Griko-Italian resource.
In Proc. COLING, 2529–2539. 2018.
Arturo Argueta and David Chiang.
Composing finite state transducers on GPUs.
In Proc. ACL, 2697–2705. 2018.
Justin DeBenedetto and David Chiang.
Algorithms and training for weighted multiset automata and regular expressions.
In Proc. Conference on Implementation and Applications of Automata, 146–158. 2018.
Antonios Anastasopoulos and David Chiang.
Leveraging translations for speech transcription in low-resource settings.
In Proc. INTERSPEECH. 2018.
Corey Pennycuff, Satyaki Sikdar, Catalina Vajiac, David Chiang, and Tim Weninger.
Synchronous hyperedge replacement graph grammars.
In Proc. Conference on Graph Transformations. 2018.
Antonios Anastasopoulos and David Chiang.
Tied multitask learning for neural speech translation.
In Proc. NAACL: HLT, volume 1, 82–91. 2018.
Toan Nguyen and David Chiang.
Improving lexical choice in neural machine translation.
In Proc. NAACL: HLT, volume 1, 334–343. 2018.
Huadong Chen, Shujian Huang, David Chiang, Xinyu Dai, and Jiajun Chen.
Combining character and word information in neural machine translation using a multi-level attention.
In Proc. NAACL: HLT, volume 1, 1284–1293. 2018.
Toan Q. Nguyen and David Chiang.
Transfer learning across low-resource, related languages for neural machine translation.
In Proc. IJCNLP, volume 2, 296–301. 2017.
Huadong Chen, Shujian Huang, David Chiang, Xin-Yu Dai, and Jiajun Chen.
Top-rank enhanced listwise optimization for statistical machine translation.
In Proc. CoNLL, 90–99. 2017.
Huadong Chen, Shujian Huang, David Chiang, and Jiajun Chen.
Improved neural machine translation with a syntax-aware encoder and decoder.
In Proc. ACL, volume 1, 1936–1945. 2017.
Arturo Argueta and David Chiang.
Decoding with finite-state transducers on GPUs.
In Proc. EACL, volume 1, 1044–1052. 2017.
Antonios Anastasopoulos, David Chiang, and Long Duong.
An unsupervised probability model for speech-to-translation alignment of low-resource languages.
In Proc. EMNLP, 1255–1263. 2016.
Salvador Aguiñaga, Rodrigo Palacios, David Chiang, and Tim Weninger.
Growing graphs from hyperedge replacement graph grammars.
In
Proc. CIKM, 469–478. 2016.
doi:10.1145/2983323.2983826.
Long Duong, Antonios Anastasopoulos, David Chiang, Steven Bird, and Trevor Cohn.
An attentional model for speech translation without transcription.
In Proc. NAACL: HLT, 949–959. 2016.
Tomer Levinboim and David Chiang.
Supervised phrase table triangulation with neural word embeddings for low-resource languages.
In Proc. EMNLP, 1079–1083. 2015.
Tomer Levinboim and David Chiang.
Multi-task word alignment triangulation for low-resource languages.
In Proc. NAACL: HLT, 1221–1226. 2015.
Kenton Murray and David Chiang.
Auto-sizing neural networks: with applications to \(n\)-gram language models.
In Proc. EMNLP, 908–916. 2015.
Tomer Levinboim, Ashish Vaswani, and David Chiang.
Model invertibility regularization: sequence alignment with or without parallel data.
In Proc. NAACL: HLT, 609–618. 2015.
Theerawat Songyot and David Chiang.
Improving word alignment using word similarity.
In Proc. EMNLP, 1840–1845. 2014.
Hui Zhang and David Chiang.
Kneser-Ney smoothing on expected counts.
In Proc. ACL, volume 1, 765–774. 2014.
Ashish Vaswani, Yinggong Zhao, Victoria Fossum, and David Chiang.
Decoding with large-scale neural language models improves translation.
In Proc. EMNLP, 1387–1392. 2013.
David Chiang, Jacob Andreas, Daniel Bauer, Karl Moritz Hermann, Bevan Jones, and Kevin Knight.
Parsing graphs with hyperedge replacement grammars.
In Proc. ACL, volume 1, 924–932. 2013.
Steven Bird and David Chiang.
Machine translation for language preservation.
In Proc. COLING, 125–134. 2012.
Ashish Vaswani, Liang Huang, and David Chiang.
Smaller alignment models for better translations: unsupervised word alignment with the \(\ell _0\)-norm.
In Proc. ACL, volume 1, 311–319. 2012.
Hui Zhang and David Chiang.
An exploration of forest-to-string translation: does translation help or hurt parsing?
In Proc. ACL, volume 2, 317–321. 2012.
Ashish Vaswani, Haitao Mi, Liang Huang, and David Chiang.
Rule Markov models for fast tree-to-string translation.
In Proc. ACL: HLT, 856–864. 2011.
Dirk Hovy, Ashish Vaswani, Stephen Tratz, David Chiang, and Eduard Hovy.
Models and training for unsupervised preposition sense disambiguation.
In Proc. ACL: HLT, 323–328. 2011.
David Chiang, Steve DeNeefe, and Michael Pust.
Two easy improvements to lexical weighting.
In Proc. ACL: HLT, 455–460. 2011.
Shu Cai, David Chiang, and Yoav Goldberg.
Language-independent parsing with empty elements.
In Proc. ACL: HLT, 212–216. 2011.
Ashish Vaswani, Adam Pauls, and David Chiang.
Efficient optimization of an MDL-inspired objective function for unsupervised part-of-speech tagging.
In Proc. ACL, 209–214. 2010.
David Chiang.
Learning to translate with source and target syntax.
In Proc. ACL, 1443–1452. 2010.
David Chiang, Jonathan Graehl, Kevin Knight, Adam Pauls, and Sujith Ravi.
Bayesian inference for finite-state transducers.
In HLT: NAACL, 447–455. 2010.
Adam Pauls, Dan Klein, David Chiang, and Kevin Knight.
Unsupervised syntactic alignment with inversion transduction grammars.
In HLT: NAACL, 118–126. 2010.
Sujith Ravi, Ashish Vaswani, Kevin Knight, and David Chiang.
Fast, greedy model minimization for unsupervised tagging.
In Proc. COLING, 940–948. 2010.
John DeNero, David Chiang, and Kevin Knight.
Fast consensus decoding over translation forests.
In Proc. ACL-IJCNLP, 567–575. 2009.
David Chiang, Kevin Knight, and Wei Wang.
11,001 new features for statistical machine translation.
In Proc. HLT: NAACL, 218–226. 2009.
Best Paper Award.
David Chiang, Steve DeNeefe, Yee Seng Chan, and Hwee Tou Ng.
Decomposability of translation metrics for improved evaluation and efficient algorithms.
In Proc. EMNLP, 610–619. 2008.
David Chiang, Yuval Marton, and Philip Resnik.
Online large-margin training of syntactic and structural translation features.
In Proc. EMNLP, 224–233. 2008.
Hao Zhang, Daniel Gildea, and David Chiang.
Extracting synchronous grammar rules from word-level alignments in linear time.
In Proc. COLING, 1081–1088. 2008.
Liang Huang and David Chiang.
Forest rescoring: faster decoding with integrated language models.
In Proc. ACL, 144–151. 2007.
Yee Seng Chan, Hwee Tou Ng, and David Chiang.
Word sense disambiguation improves statistical machine translation.
In Proc. ACL, 33–40. 2007.
David Chiang, Mona Diab, Nizar Habash, Owen Rambow, and Safiullah Shareef.
Parsing Arabic dialects.
In Proc. EACL. 2006.
David Chiang, Adam Lopez, Nitin Madnani, Christof Monz, Philip Resnik, and Michael Subotin.
The Hiero machine translation system: extensions, evaluation, and analysis.
In Proc. HLT-EMNLP, 779–786. 2005.
David Chiang.
A hierarchical phrase-based model for statistical machine translation.
In
Proc. ACL, 263–270. 2005.
doi:10.3115/1219840.1219873.
Best Paper Award.
David Chiang.
Mildly context sensitive grammars for estimating maximum entropy models.
In Gerald Penn, editor, Proc. Conference on Formal Grammar. 2003.
David Chiang and Aravind K. Joshi.
Formal grammars for estimating partition functions of double-stranded chain molecules.
In Proc. HLT, 63–67. 2002.
David Chiang and Daniel M. Bikel.
Recovering latent information in treebanks.
In Proc. COLING. 2002.
Fudong Chiou, David Chiang, and Martha Palmer.
Facilitating treebank annotation using a statistical parser.
In Proc. HLT. 2001.
David Chiang.
Statistical parsing with an automatically-extracted tree adjoining grammar.
In
Proc. ACL, 456–463. 2000.
doi:10.3115/1075218.1075276.
William Schuler, David Chiang, and Mark Dras.
Multi-component
TAG and notions of formal power.
In
Proc. ACL, 448–455. 2000.
doi:10.3115/1075218.1075275.
Refereed workshop papers
Stephen Bothwell, Abigail Swenor, and David Chiang.
Nostra Domina at EvaLatin 2024: improving Latin polarity detection through data augmentation.
In Proc. Workshop on Language Technologies for Historical and Ancient Languages, 215–222. 2024.
Aarohi Srivastava and David Chiang.
Fine-tuning BERT with character-level noise for zero-shot transfer to dialects and closely-related languages.
In Proc. Workshop on NLP for Similar Languages, Varieties and Dialects. 2023.
Chihiro Taguchi and David Chiang.
Introducing morphology in Universal Dependencies Japanese.
In Proc. Workshop on Universal Dependencies, 65–72. 2023.
Darcey Riley and David Chiang.
A continuum of generation tasks for investigating length bias and degenerate repetition.
In Proc. BlackboxNLP. 2022.
Colin McDonald and David Chiang.
Syntax-based attention masking for neural machine translation.
In Proc. NAACL Student Research Workshop. 2021.
Kenton Murray, Jeffery Kinnison, Toan Q. Nguyen, Walter Scheirer, and David Chiang.
Auto-sizing the Transformer network: improving speed, efficiency, and performance for low-resource machine translation.
In Proc. Workshop on Neural Generation and Translation, 231–240. 2019.
Kenton Murray, Brian DuSell, and David Chiang.
Efficiency through auto-sizing:
Notre
Dame
NLP's submission to the
WNGT 2019 efficiency task.
In
Proc. Workshop on Neural Generation and Translation, 297–301. 2019.
doi:10.18653/v1/D19-5634.
Xinyi Wang, Salvador Aguinaga, Tim Weninger, and David Chiang.
Growing better graphs with latent-variable probabilistic graph grammars.
In Proc. Workshop on Mining and Learning with Grammars. 2018.
Antonios Anastasopoulos, Sameer Bansal, David Chiang, Sharon Goldwater, and Adam Lopez.
Spoken term discovery for language documentation using translations.
In Proc. Workshop on Speech-Centric NLP, 53–58. 2017.
Antonios Anastasopoulos and David Chiang.
A case study on using speech-to-translation alignments for language documentation.
In Proc. Workshop on Use of Computational Methods in Study of Endangered Languages, 170–178. 2017.
David Chiang and Tatjana Scheffler.
Flexible composition and delayed tree-locality.
In Proc. TAG+, 17–24. 2008.
David Chiang and Owen Rambow.
The hidden TAG model: synchronous grammars for parsing resource-poor languages.
In Proc. TAG+, 1–8. 2006.
David Chiang.
The weak generative capacity of linear tree-adjoining grammars.
In Proc. TAG+, 25–32. 2006.
Liang Huang and David Chiang.
Better \(k\)-best parsing.
In Proc. IWPT, 53–64. 2005.
David Chiang.
Uses and abuses of intersected languages.
In Proc. TAG+, 9–15. 2004.
David Chiang.
Putting some weakly context-free formalisms in order.
In Proc. TAG+, 11–18. 2002.
Daniel M. Bikel and David Chiang.
Two statistical parsing models applied to the
Chinese Treebank.
In
Proc. Chinese Language Processing Workshop, 1–6. 2000.
doi:10.3115/1117769.1117771.
David Chiang, William Schuler, and Mark Dras.
Some remarks on an extension of synchronous TAG.
In Proc. TAG+, 61–66. 2000.
Mark Dras, David Chiang, and William Schuler.
A multi-level TAG approach to dependency.
In Proc. ESSLLI Workshop on Linguistic Theory and Grammar Implementation, 33–46. 2000.
Invited presentations
- Transformer Expressivity and Formal Logic. Workshop on Transformers as a Computational Model, Simons Institute for the Theory of Computing, 2024/09/24.
- Expressivity of Transformers: Logic, Circuits, and Formal Languages. European Summer School in Logic, Language, and Information, 2024/07/29–08/02, with J. Rawski, L. Strobl, and A. Yang.
- What Large Language Models Can and Can't Do. Annual Conference of the Society of Catholic Scientists, 2024/06/08.
- Tighter Bounds on the Expressivity of Transformer Encoders. Oxford University, Data, Knowledge and Action Seminar (online), 2023/10/10.
- Transformers and Formal Logic. Seminar on Formal Languages and Neural Networks (online), 2023/06/12.
- Exact Recursive Probabilistic Programming. Indiana University, Logic Seminar, 2023/04/26.
- Exact Recursive Probabilistic Programming. Johns Hopkins University, CLSP Seminar, 2022/10/16.
- Overcoming a Theoretical Limitation of Self-Attention. Seminar on Formal Languages and Neural Networks (online), 2022/07/11.
- Overcoming a Theoretical Limitation of Self-Attention. University of Chicago, 2022/05/06.
- Two Ways of Thinking about Weighted Relations. NeurIPS 2021 Workshop on Databases and AI, 2021/12/13.
- Panel on AI in Education, Midwest AI Day, 2021/08/03.
- Automatic Augmentation of Language Documentation. Midwest Speech and Language Days, Toyota Technological Institute Chicago, 2019/06/01.
- Hierarchical and Syntax-Based Translation. MT Marathon in the Americas, Dayton, Ohio, 2017/05/22.
- Speech-to-Translation Alignment for Documentation of Endangered Languages. Dublin City University, 2016/12/01; USC Information Sciences Institute, 2017/01/09; Indiana University, 2017/03/03
- Finite Automata for Free Word Order Languages. Workshop on Discontinuous Structures in Natural Language Processing, 2016/06/17.
- Language Models; Hierarchical and Syntax-Based Translation. MT Marathon in the Americas, University of Notre Dame, 2016/05/17, 19.
- Machine Translation. Chinese Information Processing Society Summer School, Peking University, 2015/07/25.
- Hierarchical and Syntax-Based Translation. MT Marathon in the Americas, Urbana-Champaign, 2015/05/12.
- Graph Grammars and Automata for NLP. Toyota Technological Institute at Chicago Colloquium, 2014/09/29; Johns Hopkins University, CLSP Seminar, 2014/10/31; Carnegie Mellon University, LTI Colloquium, 2014/11/07.
- Learning Syntax and Semantics for Machine Translation. University of Texas AI Colloquium, 2014/10/04; University of Notre Dame CSE Seminar, 2014/10/10; University of Michigan, 2013/10/14.
- Machine translation: what is it and what can('t) it do? Summer Institute of Linguistics, Ukarumpa, Papua New Guinea, 2012/05/30.
- Machine translation for language preservation: Improving access to knowledge and heritage for small languages. Macquarie University, Sydney, 2012/05/18.
- Synchronous Grammars, tutorial given at TAG+11, 2012/09/26.
- Hope and fear for discriminative training of statistical translation models. Columbia University, 2011/05/17.
- Learning to translate with source and target syntax. IBM TJ Watson
Research Center, 2010/06/14; Carnegie Mellon University, Language
Technologies Institute, 2010/06/16.
- Towards tree-to-tree translation. Cambridge University, 2010/02/01; University
of Edinburgh, School of Informatics, 2010/02/05.
- What can computational linguistics do for computational
stemmatology? Studia Stemmatologica, University of Helsinki, 2010/01/28.
- Online large-margin training of syntactic and structural translation features. Johns Hopkins University, CLSP Seminar, 2008/09/16.
- Microsoft Research Asia, Beijing, 2007/09/19, 2007/09/26.
- Harvard University, Computer Science Colloquium, 2006/11/09.
- Synchronous Grammars and Tree Transducers, tutorial given at ACL-COLING 2006 with K. Knight, 2006/07/16.
- National University of Singapore, School of Computing, 2006/04/11.
- NIPS 2005 Workshop on Advances in Structured Learning for Text and Speech Processing, 2005/12/09.
- NYU Department of Computer Science, 2005/09/16.
- Some Computational Complexity Results for Synchronous Context-Free Grammars. USC Information Sciences Institute, NL Seminar, 2005/09/30
- Hiero: Finding Structure in Statistical Machine Translation. Google, Inc., 2005/06/13; USC Information Sciences Institute, NL Seminar, 2005/07/06.
- From phrase-based towards syntax-based machine translation. JHU CLSP, 2005/02/08.
- Sizing up formal grammars for statistical parsing and translation. Language Weaver, Inc., 2004/06/03.
- Putting formal grammars to work. Univ. of MD Inst. for Advanced Computer Studies, Computational Linguistics Colloquium, 2004/04/26.
- Formal grammars for biological sequence analysis. NYU Department of
Computer Science, 2002/04/12.
- Extracting tree adjoining grammars for statistical parsing of English
and Chinese. Univ. of MD Inst. for Advanced Computer Studies,
Computational Linguistics Colloquium, 2001/08/15.
Grants awarded
- Learning to Retrieve Structured Information for Summarization and Translation of Unstructured Text. NSF, 2022–present. With co-PI Meng Jiang. $500,000.
- NL(V)P: Natural Language (Variety) Processing. With A. Anastasopoulos and Y. Tsvetkov. NSF, 2021–present. ND portion $165,000.
- Language Documentation with an Artificial Intelligence Helper. With A. Anastasopoulos and G. Walther. NSF, 2021–present. ND portion $210,000.
- Differentiable Probabilistic Programming with Recursive Structured Models. With C. C. Shan. NSF. ND portion $375,000.
- Genera Dicendi: Computer-assisted analysis of Augustine’s homiletic style. Notre Dame Faculty Research Support Program. With H. Müller. $97,000.
- Summarization and Domain-Adaptive Retrieval of Information Across Languages. IARPA MATERIAL (subcontract with USC/ISI), $273,000.
- Neural Pushdown and Tree-Stack Automata. 2017–2019. Google Faculty Research Award, $61,500.
- New Directions for Whole-Sentence Training of Neural Translation Models. 2017. Amazon Academic Research Award, $58,000.
- Exploiting Language Information for Situational Awareness. 2015–2021. DARPA LORELEI (subcontract with USC/ISI), $675,000.
- Learning Better Translation Models by Learning More Translation Models. 2015–2016. Google Faculty Research Award, $42,000.
- Language Induction Meets Language Documentation: Leveraging Bilingual Aligned Audio for Learning and Preserving Languages. With S. Bird. 2014–2017. NSF, $470,000.
- Training Machine Translation Models as Deep Architectures. 2013–2014. Google Faculty Research Award, $84,000.
- Automatic Knowledge Acquisition for Language Translation. 2012–2014. DARPA Computer Science Study Panel, $500,000.
- Machine Translation for Language Preservation. With S. Bird. 2011–2012. NSF EAGER, $180,000.
- TAU: Learning Natural Language Structure using Multilingual Texts. 2010–2012. DARPA Computer Science Study Panel, $400,000.
- Phylo: Phylogenetic Reconstruction of Textual Histories. 2010–2011. NSF EAGER, $75,000.
- Structured Learning for Structured Machine Translation. 2009–2010. DARPA Computer Science Study Panel, $100,000.
Professional activities
- NAACL executive board, 2011–2012
- Secretary, ACL Special Interest Group on Machine Translation, 2009–2021
- Editorial board: Computational Linguistics, 2006–2008; Artificial Intelligence Research, 2009–2012; Machine Translation Journal, 2012–present
- Action editor, Transactions of the ACL, 2012–present
- Guest editor, ACM Trans. Asian Language Information Processing Special Issue on Machine Translation
- Program co-chair, EMNLP 2018, with J. Hockenmaier and J. Tsujii
- Program co-chair, 12th International Workshop on Tree Adjoining Grammars and Related Formalisms, 2016, with A. Koller
- Area chair: ACL 2006, 2008, EMNLP 2017, 2019 (machine translation and multilinguality); EMNLP 2007, 2010, IJCNLP 2011 (machine translation); NAACL 2018 (theory and formalisms); NAACL 2021 (parsing)
- Senior area chair: ACL 2020 (syntax and parsing)
- Reviewer: Artificial Intelligence, Trans. Audio, Speech and Language Processing, J. Natural Language Engineering, J. Computer and System Sciences, Science, and others
- Reviewer: ACL, NAACL, EACL, IJCNLP, EMNLP, NIPS, ICML, IJCAI, AAAI
- NSF review panelist, 2002, 2010, 2013, 2014, 2016, 2021, 2022
- Co-organizer, Theory of Neural Language Models, Dagstuhl Seminar 25282, with P. Barceló, G. Cybenko, and L. Strobl, 2025.
- Organizer, NAACL Workshop on Syntax and Structure in Statistical Translation, with D. Wu, 2007–2009
- Tutorials chair: ACL-IJCNLP 2021
- Publications chair, EMNLP 2009
- Data lead for ACL Anthology, 2019–2022
- Local organizer for NAACL HLT 2010, with E. Hovy, J. May, and J. Riesa
- Local organizer for Midwest Speech and Language Days, 2018
- Organizer, Machine Translation Marathon in the Americas, 2016
- Local site coordinator for North American Computational Linguistics Olympiad, 2009–present
Other information
- Citizenship: U.S.A.
- Professional societies: Association for Computational Linguistics, Society of Catholic Scientists
- Languages: English (native), Mandarin Chinese (basic), ancient Latin and Greek (reading)