My research aims to develop natural language models that leverage long text sequences, generalize well from a few examples, and are computationally efficient. Such models can benefit core tasks such as language modeling and downstream tasks involving classification, translation, or alignment.
→ Natural Language Processing
language modeling, machine translation, document modeling, style transfer, multilinguality, sentiment analysis and summarization
→ Machine Learning
weakly-supervised learning, attention mechanisms, long sequence modeling, conditional generation, distance metric learning


  • groc - a Pytorch implementation grounded compositional output embeddings for adaptive language modeling, which was presented at EMNLP 2020.
    PDF Code
  • drill - a Pytorch implementation of deep residual output embedding layers for neural language generation which was presented at ICML 2019.
    PDF Code
  • gile — a Keras implementation of generalized input-label embeddings for low-resource and zero-resource text classification which was presented in TACL 2019.
    PDF Code
  • mhan — a Keras implementation of multilingual hierarchical attention networks for document classification which was presented at IJCNLP 2017.
    PDF Code Demo


  • DRPL ー A Benchmark for Document Relation Prediction and Localization (873K documents, 656K pairs).
  • DW ー Deutsche-Welle dataset for multilingual news text classification in 8 languages (600K documents, 5.5K classes).
  • HATDOC ー Human Attention for Document Classification dataset for evaluation of attention-based models in aspect-based sentiment classification (50K documents, 1.6K sentences, 3 classes).
  • MVSO ー Multilingual Visual Sentiment Ontology with over 15K hierarchically organized visual concepts from 12 languages for sentiment concept detection (7.36M images, 15.6K labels).
  • TED ー A lecture Recommendation Dataset with User Ratings and Comments (100K ratings, 200K comments).