Question Rewriting for Conversational Question Answering (Preprint)
Svitlana Vakulenko, Shayne Longpre, Zhucheng Tu, Raviteja Anantha
arXiv, April 2020

Conversational question answering (QA) requires answers conditioned on the previous turns of the conversation. We address the conversational QA task by decomposing it into question rewriting and question answering subtasks, and conduct a systematic evaluation of this approach on two publicly available datasets. Question rewriting is designed to reformulate ambiguous questions, dependent on the conversation context, into unambiguous questions that are fully interpretable outside of the conversation context. Thereby, standard QA components can consume such explicit questions directly. The main benefit of this approach is that the same questions can be used for querying different information sources, e.g., multiple 3rd-party QA services simultaneously, as well as provide a human-readable interpretation of the question in context. To the best of our knowledge, we are the first to evaluate question rewriting on the conversational question answering task and show its improvement over the end-to-end baselines. Moreover, our conversational QA architecture based on question rewriting sets the new state of the art on the TREC CAsT 2019 dataset with a 28% improvement in MAP and 21% in NDCG@3. Our detailed analysis of the evaluation results provide insights into the sensitivity of QA models to question reformulation, and demonstrates the strengths and weaknesses of the retrieval and extractive QA architectures, that should be reflected in their integration.

Paper Abstract
Least Squares Binary Quantization of Neural Networks (Preprint)
Hadi Pouransari, Zhucheng Tu, and Oncel Tuzel
arXiv, March 2020

Quantizing weights and activations of deep neural networks results in significant improvement in inference efficiency at the cost of lower accuracy. A source of the accuracy gap between full precision and quantized models is the quantization error. In this work, we focus on the binary quantization, in which values are mapped to -1 and 1. We provide a unified framework to analyze different scaling strategies. Inspired by the pareto-optimality of 2-bits versus 1-bit quantization, we introduce a novel 2-bits quantization with provably least squares error. Our quantization algorithms can be implemented efficiently on the hardware using bitwise operations. We present proofs to show that our proposed methods are optimal, and also provide empirical error analysis. We conduct experiments on the ImageNet dataset and show a reduced accuracy gap when using the proposed least squares quantization algorithms.

Paper Abstract
An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering
Shayne Longpre*, Yi Lu*, Zhucheng Tu*, and Chris DuBois
Proceedings of the 2nd Workshop on Machine Reading for Question Answering, November 2019, Hong Kong, China

To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pre-trained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a simple negative sampling technique to be particularly effective, even though it is typically used for datasets that include unanswerable questions, such as SQuAD 2.0. When applied in conjunction with per-domain sampling, our XLNet (Yang et al., 2019)-based submission achieved the second best Exact Match and F1 in the MRQA leaderboard competition.

Paper Abstract
Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures
Zhucheng Tu, Mengping Li, and Jimmy Lin
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, June 2018, New Orleans, USA

We demonstrate the serverless deployment of neural networks for model inferencing in NLP applications using Amazon's Lambda service for feedforward evaluation and DynamoDB for storing word embeddings. Our architecture realizes a pay-per-request pricing model, requiring zero ongoing costs for maintaining server instances. All virtual machine management is handled behind the scenes by the cloud provider without any direct developer intervention. We describe a number of techniques that allow efficient use of serverless resources, and evaluations confirm that our design is both scalable and inexpensive.

Paper Abstract
CNNs for NLP in the Browser: Client-Side Deployment and Visualization Opportunities
Yiyun Liang, Zhucheng Tu, Laetitia Huang, and Jimmy Lin
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, June 2018, New Orleans, USA

We demonstrate a JavaScript implementation of a convolutional neural network that performs feedforward inference completely in the browser. Such a deployment means that models can run completely on the client, on a wide range of devices, without making backend server requests. This design is useful for applications with stringent latency requirements or low connectivity. Our evaluations show the feasibility of JavaScript as a deployment target. Furthermore, an in-browser implementation enables seamless integration with the JavaScript ecosystem for information visualization, providing opportunities to visually inspect neural networks and better understand their inner workings.

Paper Abstract
An Experimental Analysis of the Power Consumption of Convolutional Neural Networks for Keyword Spotting
Raphael Tang, Weijie Wang, Zhucheng Tu, and Jimmy Lin
Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018), April 2018, Calgary, Canada

Nearly all previous work on small-footprint keyword spotting with neural networks quantify model footprint in terms of the number of parameters and multiply operations for a feedforward inference pass. These values are, however, proxy measures since empirical performance in actual deployments is determined by many factors. In this paper, we study the power consumption of a family of convolutional neural networks for keyword spotting on a Raspberry Pi. We find that both proxies are good predictors of energy usage, although the number of multiplies is more predictive than the number of model parameters. We also confirm that models with the highest accuracies are, unsurprisingly, the most power hungry.

Paper Abstract
An Exploration of Approaches to Integrating Neural Reranking Models in Multi-Stage Ranking Architectures
Zhucheng Tu, Matt Crane, Royal Sequiera, Junchen Zhang, and Jimmy Lin
Proceedings of the SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR'17), August 2017, Tokyo, Japan

We explore different approaches to integrating a simple convolutional neural network (CNN) with the Lucene search engine in a multi-stage ranking architecture. Our models are trained using the PyTorch deep learning toolkit, which is implemented in C/C++ with a Python frontend. One obvious integration strategy is to expose the neural network directly as a service. For this, we use Apache Thrift, a software framework for building scalable cross-language services. In exploring alternative architectures, we observe that once trained, the feedforward evaluation of neural networks is quite straightforward. Therefore, we can extract the parameters of a trained CNN from PyTorch and import the model into Java, taking advantage of the Java Deeplearning4J library for feedforward evaluation. This has the advantage that the entire end-to-end system can be implemented in Java. As a third approach, we can extract the neural network from PyTorch and "compile" it into a C++ program that exposes a Thrift service. We evaluate these alternatives in terms of performance (latency and throughput) as well as ease of integration. Experiments show that feedforward evaluation of the convolutional neural network is significantly slower in Java, while the performance of the compiled C++ network does not consistently beat the PyTorch implementation.

Paper Abstract
Exploring the Effectiveness of Convolutional Neural Networks for Answer Selection in End-to-End Question Answering
Royal Sequiera, Gaurav Baruah, Zhucheng Tu, Salman Mohammed, Jinfeng Rao, Haotian Zhang, and Jimmy Lin
Proceedings of the SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR'17), August 2017, Tokyo, Japan

Most work on natural language question answering today focuses on answer selection: given a candidate list of sentences, determine which contains the answer. Although important, answer selection is only one stage in a standard end-to-end question answering pipeline. This paper explores the effectiveness of convolutional neural networks (CNNs) for answer selection in an end-to-end context using the standard TrecQA dataset. We observe that a simple idf-weighted word overlap algorithm forms a very strong baseline, and that despite substantial efforts by the community in applying deep learning to tackle answer selection, the gains are modest at best on this dataset. Furthermore, it is unclear if a CNN is more effective than the baseline in an end-to-end context based on standard retrieval metrics. To further explore this finding, we conducted a manual user evaluation, which confirms that answers from the CNN are detectably better than those from idf-weighted word overlap. This result suggests that users are sensitive to relatively small differences in answer selection quality.

Paper Abstract
Prizm: A Wireless Access Point for Proxy-Based Web Lifelogging
Jimmy Lin, Zhucheng Tu, Michael Rose, and Patrick White
Proceedings of the First Workshop on Lifelogging Tools and Applications (LTA 2016), October 2016, Amsterdam, The Netherlands

We present Prizm, a prototype lifelogging device that comprehensively records a user’s web activity. Prizm is a wireless access point deployed on a Raspberry Pi that is designed to be a substitute for the user’s normal wireless access point. Prizm proxies all HTTP(S) requests from devices connected to it and records all activity it observes. Although this particular design is not entirely novel, there are a few features that are unique to our approach, most notably the physical deployment as a wireless access point. Such a package allows capture of activity from multiple devices, integration with web archiving for preservation, and support for offline operation. This paper describes the design of Prizm, the current status of our project, and future plans.

Paper Abstract