- Ph.D. in Computer and Information Science, University of Pennsylvania, 2004. Advisor: Aravind K. Joshi. Dissertation: Evaluation of Grammar Formalisms for Applications to Natural Language Processing and Biological Sequence Analysis. Published as: Grammars for Language and Genes: Theoretical and Empirical Investigations, Springer, 2012.
- S.M. in Computer Science, Harvard University, 1997.
- A.B. cum laude in Computer Science, Harvard University, 1997.
Honors and awards
- Best paper award, with W. Wang and K. Knight, NAACL HLT 2009
- Best paper award, ACL 2005
- Morris and Dorothy Rubinoff Award, 2005, University of Pennsylvania, for a dissertation that represents an advance in innovative application of computer technology
- Phi Beta Kappa, fall 1996, and Detur Book Prize, 1994, Harvard University, awarded to top 5% of class
- 2014–present: Associate professor, University of Notre Dame, Department of Computer Science and Engineering.
- 2014–present: Adjunct associate professor, USC
Department of Computer Science.
- 2013–2014: Project leader, USC Information Sciences Institute.
- 2007–present: Research assistant professor, USC
Department of Computer Science.
- 2006–2012: Computer scientist, USC Information Sciences Institute.
- Summer 2005: Senior researcher, Johns Hopkins CLSP Summer Workshop.
- 2004–2005: Postdoctoral research associate, Univ. of Maryland Institute for Advanced Computer Studies.
- Former PhD students: Ashish Vaswani (2014; now research scientist at USC/ISI)
- Former postdocs: Victoria Fossum (2013; now at Google)
- Tomer Levinboim, Antonios Anastasopoulos, Kenton Murray, Arturo Argueta
- Summer interns:
Michael Bloodgood (Delaware), Wei Ho (Princeton), Paramveer Dhillon (Penn), John DeNero (Berkeley), Amittai Axelrod (Washington), Adam Pauls (Berkeley), Yoav Goldberg (Ben Gurion), Anne Irvine (JHU), Xuchen Yao (JHU), Ada Wan (CUNY), Jackie Lee (MIT), Arvind Neelakantan (Columbia/UMass)
- PhD thesis committees
- external: Adam Lopez (Univ. of Maryland), 2008; Hendra Setiawan (Natl. Univ. of Singapore), 2008; John DeNero (Berkeley), 2010; Kevin Gimpel (CMU), 2012; Baskaran Sankaran (Simon Fraser), 2013; Andrea Gesmundo (Geneva), 2013
- internal: Jonathan May, 2010; Sujith Ravi, 2011; Stephen Tratz, 2011; Steve DeNeefe, 2011; Dirk Hovy, 2013
- Fall 2007–2009, 2011–2012: Instructor, Empirical Methods in Natural Language
Processing, Univ. of Southern California, with K. Knight and L. Huang
- Spring 2011: Instructor, Statistical Machine Translation, Univ. of Southern California, with K. Knight and L. Huang
- Spring 2007 and 2008: Instructor, Natural Language Processing,
Univ. of Southern California, with E. Hovy et al.
- Fall 2005: Instructor, Computational Linguistics I, Univ. of Maryland, with Philip Resnik
- D. Chiang, 2012. Hope and fear for discriminative training of statistical
translation models. J. Machine Learning Research 13:1159–1187.
- Y. Marton, D. Chiang, and P. Resnik. 2012. Soft syntactic constraints for Arabic-English hierarchical phrase-based translation. Machine Translation 26(1–2):137–157.
- D. Chiang, 2007. Hierarchical phrase-based translation.
Computational Linguistics 33(2):201–228.
- K. A. Dill, A. Lucas,
J. Hockenmaier, L. Huang, D. Chiang, and A. K. Joshi,
2007. Computational linguistics: a new tool for exploring biopolymer
structures and statistical mechanics. Polymer 48:4289–4300.
- D. Chiang, A. K. Joshi, and D. B. Searls, 2006. Grammatical
representations of macromolecular structure. J. Computational
- D. Chiang, A. K. Joshi, and K. A. Dill, 2006. A grammatical theory for the conformational changes of simple helix bundles. J. Computational Biology 13(1):21–42.
- M. Dras, D. Chiang, and W. Schuler, 2004. On relations of constituency and dependency grammars. Research on Language and Computation 2(2):281–305.
Books and book chapters
- D. Chiang. 2012. Grammars for Language and Genes: Theoretical and Empirical Investigations. Springer.
- D. Chiang. 2003. Statistical parsing with an automatically extracted tree adjoining grammar, In R. Bod et al., editors, Data Oriented Parsing. CSLI Publications, Stanford, pages 299–316.
Refereed conference papers
- Antonios Anastasopoulos, Long Duong, and David Chiang, 2016. An unsupervised probability model for speech-to-translation alignment of low-resource Languages. To appear at EMNLP.
- S. Aguinaga, R. Palacios, D. Chiang, and T. Weninger, 2016. Growing graphs with hyperedge replacement graph grammars. To appear in Proc. CIKM.
- L. Duong, T. Cohn, S. Bird, and D. Chiang, 2016. An attentional model for speech translation without transcription. In Proc. NAACL HLT.
- K. Murray and D. Chiang, 2015. Auto-sizing neural networks: with applications to n-gram language models. In Proc. EMNLP.
- T. Levinboim and D. Chiang, 2015. Supervised phrase table triangulation with neural word embeddings for low-resource languages. In Proc. EMNLP.
- T. Levinboim, A. Vaswani, and D. Chiang, 2016. Model Invertibility Regularization: Sequence alignment with or without parallel data. In Proc. NAACL HLT, pages 609–618.
- T. Levinboim and D. Chiang, 2015. Multi-task word alignment triangulation for low-resource languages. In Proc. NAACL HLT, pages 1221–1226.
- T. Songyot and D. Chiang. 2014. Improving word alignment using word similarity. In Proc. EMNLP.
- H. Zhang and D. Chiang. 2014. Kneser-Ney smoothing on expected counts. In Proc. ACL, 765–774.
- A. Vaswani, Y. Zhao, V. Fossum, and D. Chiang. 2013. Decoding with large-scale neural language models improves translation. In Proc. EMNLP, 1387–1392.
- D. Chiang, J. Andreas, D. Bauer, K.M. Hermann, B. Jones and K. Knight. 2013. Parsing graphs with hyperedge replacement grammars. Proc. ACL, pages 924–932.
- A. Vaswani, L. Huang, and D. Chiang. 2012. Smaller alignment models for better translations: unsupervised
word alignment with the l0
norm. In Proc. ACL, pages 311–319.
- H. Zhang and D. Chiang. 2012. An exploration of forest-to-string translation: Does translation help or hurt parsing? In Proc. ACL (Vol. 2: Short Papers), pages 317–321.
- A. Vaswani, H. Mi, L. Huang, and D. Chiang. 2011. Rule Markov models for fast tree-to-string translation. In
Proc. ACL, pages 856–864.
- D. Chiang, S. DeNeefe, and M. Pust. 2011. Two easy improvements to lexical weighting. In Proc. ACL (Vol. 2: Short Papers), pages 455–460.
- S. Cai, D. Chiang, and Y. Goldberg. 2011. Language-independent parsing with empty elements. In Proc. ACL (Vol. 2: Short Papers), pages 212–216.
- A. Vaswani, A. Pauls, and D. Chiang, 2010. Efficient optimization of an MDL-inspired objective function for unsupervised part-of-speech tagging. In Proc. ACL, pages 209–214.
- D. Chiang. 2010. Learning to translate with source and target
syntax. In Proc. ACL, pages 1443–1452.
- D. Chiang, J. Graehl, K. Knight, A. Pauls, and S. Ravi.
2010. Bayesian inference for finite-state transducers. In Proc. NAACL HLT, 447–455.
- A. Pauls, D. Klein, D. Chiang, and K. Knight,
2010. Unsupervised syntactic alignment with inversion transduction
grammars. In Proc. NAACL HLT, pages 118–126.
- J. DeNero,
D. Chiang, and K. Knight, 2009. Fast consensus decoding over translation forests. In Proc. ACL.
- D. Chiang, W.
Wang, and K. Knight, 2009. 11,001 new features for statistical machine translation. In Proc. NAACL HLT, pages
218–226. Best paper
- D. Chiang, Y. Marton, and P. Resnik, 2008. Online large-margin training of syntactic and structural translation features. In Proc. EMNLP, pages 224–233.
- D. Chiang, S. DeNeefe, Y. S. Chan, and H. T. Ng, 2008. Decomposability of translation metrics for improved evaluation and efficient algorithms. In Proc. EMNLP, pages 610–619.
- H. Zhang, D. Gildea, and D. Chiang. 2008. Extracting synchronous grammar rules from word-level alignments in linear time. In Proc. COLING.
- L. Huang and D. Chiang, 2007. Forest rescoring: faster decoding with integrated language
models. Proc. ACL, pages 144–151.
- Y. S. Chan, H. T. Ng, and D. Chiang, 2007. Word sense disambiguation improves statistical machine
translation. Proc. ACL, pages 33–40.
- D. Chiang, M. Diab, N. Habash, O. Rambow, and
S. Shareef, 2006. Parsing Arabic dialects. In Proc. EACL, Trento, pages 369–376.
- D. Chiang, A. Lopez, N. Madnani, C. Monz, P. Resnik, and
M. Subotin, 2005. The Hiero machine translation system: extensions, evaluation, and analysis. In Proc. HLT/EMNLP, Vancouver, pages 779–786.
- D. Chiang, 2005. A hierarchical phrase-based model for statistical machine translation. In Proc. ACL, Ann Arbor, MI, pages 263–270. Best paper award.
- D. Chiang, 2003. Mildly context sensitive grammars for estimating maximum entropy models. In Proc. Formal Grammar, Vienna, August. CSLI Publications.
- D. Chiang and D. M. Bikel, 2002. Recovering latent information in treebanks. In Proc. COLING, Taipei, August, pages 183–189.
- D. Chiang and A. K. Joshi, 2002. Formal grammars for estimating partition functions of double-stranded chain molecules. Proc. HLT, San Diego, March, pages 63–67.
- D. Chiang, 2001. Constraints on strong generative power. In Proc. ACL, Toulouse, July, pages 124–131.
- F.-D. Chiou, D. Chiang, and M. Palmer, 2001. Facilitating treebank annotation using a statistical parser. In Proc. HLT, poster session, San Diego, March, pages 117–120.
- D. Chiang, 2000. Statistical parsing with an automatically-extracted tree adjoining grammar. In Proc. ACL, pages 456–463.
- W. Schuler, D. Chiang, and M. Dras, 2000. Multi-component TAG and notions of formal power. In Proc. ACL, pages 448–455.
Refereed workshop papers
- D. Chiang, 2006. The weak generative capacity of linear tree
adjoining grammars. In Proc. TAG+8, Sydney, July.
- D. Chiang and O. Rambow, 2006. The hidden TAG model: synchronous
grammars for parsing resource-poor languages. In Proc. TAG+8, Sydney, July, pages 1–8.
- L. Huang and D. Chiang, 2005. Better k-best parsing. In Proc. IWPT, Vancouver, October, pages 53–64.
- D. Chiang, 2004. Uses and abuses of intersected languages. In Proc. TAG+7, Vancouver, May.
- D. Chiang, 2002. Putting some weakly context-free formalisms in order. In Proc. TAG+6, Venice, May, pages 11–18.
- D. M. Bikel and D. Chiang, 2000. Two statistical parsing models applied to the Chinese Treebank. In Proceedings of the Second Chinese Language Processing Workshop, Hong Kong, October, pages 1–6.
- M. Dras, D. Chiang, and W. Schuler, 2000. A multi-level TAG approach to dependency. In Proceedings of the ESSLLI-2000 Workshop on Linguistic Theory and Grammar Implementation, Birmingham, UK, August, pages 33–46.
- D. Chiang, W. Schuler, and M. Dras, 2000. Some remarks on an extension of synchronous TAG. In Proc. TAG+5, Paris, May, pages 61–66.
- Finite Automata for Free Word Order Languages. Workshop on Discontinuous Structures in Natural Language Processing, 2016/06/17.
- Machine Translation. Chinese Information Processing Society Summer School, Peking University, 2015/07/25.
- Hierarchical and Syntax-Based Translation. MT Marathon in the Americas, Urbana-Champaign, 2015/05/12.
- Graph Grammars and Automata for NLP. Toyota Technological Institute at Chicago Colloquium, 2014/09/29; Johns Hopkins University, CLSP Seminar, 2014/10/31; Carnegie Mellon University, LTI Colloquium, 2014/11/07.
- Learning Syntax and Semantics for Machine Translation. University of Texas AI Colloquium, 2014/10/04; University of Notre Dame CSE Seminar, 2014/10/10; University of Michigan, 2013/10/14.
- Machine translation: what is it and what can('t) it do? Summer Institute of Linguistics, Ukarumpa, Papua New Guinea, 2012/05/30.
- Machine translation for language preservation: Improving access to knowledge and heritage for small languages. Macquarie University, Sydney, 2012/05/18.
- Synchronous Grammars, tutorial given at TAG+11, 2012/09/26.
- Hope and fear for discriminative training of statistical translation models. Columbia University, 2011/05/17.
- Learning to translate with source and target syntax. IBM TJ Watson
Research Center, 2010/06/14; Carnegie Mellon University, Language
Technologies Institute, 2010/06/16.
- Towards tree-to-tree translation. Cambridge University, 2010/02/01; University
of Edinburgh, School of Informatics, 2010/02/05.
- What can computational linguistics do for computational
stemmatology? Studia Stemmatologica, University of Helsinki, 2010/01/28.
- Online large-margin training of syntactic and structural translation features. Johns Hopkins University, CLSP Seminar, 2008/09/16.
- Microsoft Research Asia, Beijing, 2007/09/19, 2007/09/26.
- Harvard University, Computer Science Colloquium, 2006/11/09.
- Synchronous Grammars and Tree Transducers, tutorial given at ACL-COLING 2006 with K. Knight, 2006/07/16.
- National University of Singapore, School of Computing, 2006/04/11.
- NIPS 2005 Workshop on Advances in Structured Learning for Text and Speech Processing, 2005/12/09.
- NYU Department of Computer Science, 2005/09/16.
- USC Information Sciences Institute, NL Seminar, 2005/07/06.
- Google, Inc., 2005/06/13.
- From phrase-based towards syntax-based machine translation. JHU CLSP, 2005/02/08.
- Sizing up formal grammars for statistical parsing and translation. Language Weaver, Inc., 2004/06/03.
- Putting formal grammars to work. Univ. of MD Inst. for Advanced Computer Studies, Computational Linguistics Colloquium, 2004/04/26.
- Formal grammars for biological sequence analysis. NYU Department of
Computer Science, 2002/04/12.
- Extracting tree adjoining grammars for statistical parsing of English
and Chinese. Univ. of MD Inst. for Advanced Computer Studies,
Computational Linguistics Colloquium, 2001/08/15.
- Exploiting Language Information for Situational Awareness. 2015–2019. DARPA LORELEI (subcontract with USC/ISI), $635,000.
- Learning Better Translation Models by Learning More Translation Models. 2015–2016. Google Faculty Research Award, $42,000.
- Language Induction Meets Language Documentation: Leveraging Bilingual Aligned Audio for Learning and Preserving Languages. With S. Bird. 2014–2017. NSF, $470,000.
- Training Machine Translation Models as Deep Architectures. 2013–2014. Google Faculty Research Award, $84,000.
- Automatic Knowledge Acquisition for Language Translation. 2012–2014. DARPA Computer Science Study Panel, $500,000.
- Machine Translation for Language Preservation. With S. Bird. 2011–2012. NSF EAGER, $180,000.
- TAU: Learning Natural Language Structure using Multilingual
Texts. 2010–2012. DARPA Computer Science Study Panel, $400,000.
- Phylo: Phylogenetic Reconstruction of Textual
Histories. 2010–2011. NSF EAGER, $75,000.
- Structured Learning for Structured Machine
Translation. 2009–2010. DARPA Computer Science Study Panel, $100,000.
- NAACL executive board, 2011–2012
- Secretary, ACL Special Interest Group on Machine Translation
- Local organizer for NAACL HLT 2010, with E. Hovy, J. May, and J. Riesa
- Editorial board, Computational Linguistics, 2006–2008; Artificial Intelligence Research, 2009–2012; Machine Translation Journal, 2012–present
- Action editor, Transactions of the ACL, 2012–present
- Guest editor, ACM Trans. Asian Language Information Processing Special Issue on Machine Translation
- Area chair: ACL 2006, 2008 (machine translation and
multilinguality); EMNLP 2007, 2010, IJCNLP 2011 (machine translation)
- Publications chair, EMNLP 2009
- Organizer, NAACL Workshop on Syntax and Structure in Statistical Translation, with D. Wu, 2007–2009
- Program co-chair, 12th International Workshop on Tree Adjoining Grammars and Related Formalisms, 2016
- NSF review panelist, 2002, 2010, 2013, 2014
- Reviewer, Artificial Intelligence, Trans. Audio, Speech and Language Processing, J. Natural Language Engineering, J. Computer and System Sciences, Science
- Reviewer: ACL, NAACL, EACL, IJCNLP, EMNLP, NIPS, ICML, IJCAI, AAAI.
- Local site coordinator for North American Computational Linguistics Olympiad, 2009–present
- Citizenship: U.S.A.
- Professional societies: Association for Computational Linguistics, Society of Catholic Scientists
- Languages: English (native), Mandarin Chinese (basic), ancient Latin and Greek (reading)