Publications

Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures
Zhucheng Tu, Mengping Li, and Jimmy Lin
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, June 2018, New Orleans, USA

We demonstrate the serverless deployment of neural networks for model inferencing in NLP applications using Amazon's Lambda service for feedforward evaluation and DynamoDB for storing word embeddings. Our architecture realizes a pay-per-request pricing model, requiring zero ongoing costs for maintaining server instances. In fact, all virtual machine management is handled behind the scenes by the cloud provider without any direct developer intervention. We describe a number of techniques that allow efficient use of serverless resources, and evaluations confirm that our design is both scalable and inexpensive.

Paper Abstract
CNNs for NLP in the Browser: Client-Side Deployment and Visualization Opportunities
Yiyun Liang, Zhucheng Tu, Laetitia Huang, and Jimmy Lin
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, June 2018, New Orleans, USA

We demonstrate a JavaScript implementation of a convolutional neural network that performs feedforward inference completely in the browser. Such a deployment means that models can run completely on the client---on devices ranging from laptops to mobile phones and even "smart home" gadgets---without making backend server requests. This design is useful for applications with stringent latency requirements or low connectivity. Our evaluations show the feasibility of JavaScript as a deployment target for neural network models from a performance perspective. Furthermore, an in-browser implementation enables seamless integration with the JavaScript ecosystem for information visualization, providing opportunities to visually inspect neural networks and better understand their inner workings.

Paper Abstract
An Experimental Analysis of the Power Consumption of Convolutional Neural Networks for Keyword Spotting
Raphael Tang, Weijie Wang, Zhucheng Tu, and Jimmy Lin
Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018), April 2018, Calgary, Canada

Nearly all previous work on small-footprint keyword spotting with neural networks quantify model footprint in terms of the number of parameters and multiply operations for a feedforward inference pass. These values are, however, proxy measures since empirical performance in actual deployments is determined by many factors. In this paper, we study the power consumption of a family of convolutional neural networks for keyword spotting on a Raspberry Pi. We find that both proxies are good predictors of energy usage, although the number of multiplies is more predictive than the number of model parameters. We also confirm that models with the highest accuracies are, unsurprisingly, the most power hungry.

Paper Abstract
An Exploration of Approaches to Integrating Neural Reranking Models in Multi-Stage Ranking Architectures
Zhucheng Tu, Matt Crane, Royal Sequiera, Junchen Zhang, and Jimmy Lin
Proceedings of the SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR'17), August 2017, Tokyo, Japan

We explore different approaches to integrating a simple convolutional neural network (CNN) with the Lucene search engine in a multi-stage ranking architecture. Our models are trained using the PyTorch deep learning toolkit, which is implemented in C/C++ with a Python frontend. One obvious integration strategy is to expose the neural network directly as a service. For this, we use Apache Thrift, a software framework for building scalable cross-language services. In exploring alternative architectures, we observe that once trained, the feedforward evaluation of neural networks is quite straightforward. Therefore, we can extract the parameters of a trained CNN from PyTorch and import the model into Java, taking advantage of the Java Deeplearning4J library for feedforward evaluation. This has the advantage that the entire end-to-end system can be implemented in Java. As a third approach, we can extract the neural network from PyTorch and "compile" it into a C++ program that exposes a Thrift service. We evaluate these alternatives in terms of performance (latency and throughput) as well as ease of integration. Experiments show that feedforward evaluation of the convolutional neural network is significantly slower in Java, while the performance of the compiled C++ network does not consistently beat the PyTorch implementation.

Paper Abstract
Exploring the Effectiveness of Convolutional Neural Networks for Answer Selection in End-to-End Question Answering
Royal Sequiera, Gaurav Baruah, Zhucheng Tu, Salman Mohammed, Jinfeng Rao, Haotian Zhang, and Jimmy Lin
Proceedings of the SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR'17), August 2017, Tokyo, Japan

Most work on natural language question answering today focuses on answer selection: given a candidate list of sentences, determine which contains the answer. Although important, answer selection is only one stage in a standard end-to-end question answering pipeline. This paper explores the effectiveness of convolutional neural networks (CNNs) for answer selection in an end-to-end context using the standard TrecQA dataset. We observe that a simple idf-weighted word overlap algorithm forms a very strong baseline, and that despite substantial efforts by the community in applying deep learning to tackle answer selection, the gains are modest at best on this dataset. Furthermore, it is unclear if a CNN is more effective than the baseline in an end-to-end context based on standard retrieval metrics. To further explore this finding, we conducted a manual user evaluation, which confirms that answers from the CNN are detectably better than those from idf-weighted word overlap. This result suggests that users are sensitive to relatively small differences in answer selection quality.

Paper Abstract
Prizm: A Wireless Access Point for Proxy-Based Web Lifelogging
Jimmy Lin, Zhucheng Tu, Michael Rose, and Patrick White
Proceedings of the First Workshop on Lifelogging Tools and Applications (LTA 2016), October 2016, Amsterdam, The Netherlands

We present Prizm, a prototype lifelogging device that comprehensively records a user’s web activity. Prizm is a wireless access point deployed on a Raspberry Pi that is designed to be a substitute for the user’s normal wireless access point. Prizm proxies all HTTP(S) requests from devices connected to it and records all activity it observes. Although this particular design is not entirely novel, there are a few features that are unique to our approach, most notably the physical deployment as a wireless access point. Such a package allows capture of activity from multiple devices, integration with web archiving for preservation, and support for offline operation. This paper describes the design of Prizm, the current status of our project, and future plans.

Paper Abstract