Improved representation learning for semantic role labeling

Semantic role labeling (SRL) is a shallow semantic parsing task that helps determine who did what to whom at where by recovering the latent predicate-argument structure of the sentence. SRL is a fundamental problem in NLP that is also useful in applications such as question answering, machine translation and information extraction. I've been working on reducing labeling errors on a state-of-the art SRL model called Linguistically-Informed Self Attention (LISA) developed at UMass by Emma Strubell et al. LISA is a neural network model that performs multi-task learning across dependency parsing, part-of-speech tagging, predicate detection and SRL.

Error analysis on the model revealed that if labeling errors alone were fixed, the score would improve by 5.8 absolute F1. That is, the predicates and arguments were identified correctly in several cases, but classified wrong. My analysis further showed that 31% of the labeling errors were due to core argument confusion in the PropBank label set (verb-specific roles from ARG0-ARG5). While PropBank is a useful semantic formalism, it defines coarse-grained labels which aren't strictly associated with a role. Since the meaning of a role changes across different predicates, it can be difficult for the model to learn these roles. To fix this, we're augmenting PropBank labels with finer-grained VerbNet labels by first predicting VerbNet roles and using the predictions to compose an auxiliary role representation which is then utilized for PropBank SRL.

Jun 2018 - Present

Efficient Graph-Based Word Sense Induction

arxiv | poster

This project was undertaken with the hypothesis that resolving polysemy would help improve sentiment analysis . Polysemy is the phenomenon of a single word having multiple senses, like bank the financial institution and bank as in river bank. The task of selecting the right sense is called word sense disambiguation, while the unsupervised discovery of latent senses is called word sense induction (WSI). We developed an efficient method to perform word sense induction using graph-based clustering.

Typically, graph-based clustering methods for WSI construct an 'ego-network' by finding the nearest neighbors of the target word in the word-embedding space. However, this can be computationally expensive if the graph is large, so we instead proposed to group words into basis indexes that resemble topics, and then construct a graph in which each node is a basis index relevant to the topic word. To obtain these basis indexes, we make use of Distributional Inclusion Vector Embeddings(DIVE) developed by Haw-Shiuan Chang et al. Sense clusters are then obtained by clustering basis indexes using spectral clustering. We represent each sense cluster by a sense embedding, which is the average of the topic embeddings in the cluster, weighted by its relevance to the target word.

We then perform expectation-maximization to refine the sense embeddings, where the E-step is replacing all words in the corpus with the sense it represents, and the M-step is to retrain word embeddings using the induced senses. Our method beats the previous state-of-the-art on several word context relevance tasks while producing more interpretable sense clusters more efficiently. While we haven't yet been able to correlate better word sense disambiguation with improvement in sentiment analysis, we plan to get back to this task in the near future.

Feb 2018 - Apr 2018

Low-shot visual recognition for faces

report | code

Low shot learning, or the ability to learn from a small number of examples, is a relevant problem in the domain of facial recognition, where access to training data is limited by cost and privacy issues We explored a solution based on data augmentation by hallucinating new examples, which was found to work well for classification on ImageNet by researchers at Facebook AI Research. We used a subset of MS-Celeb-1 for training data.

In the low-shot learning set-up, there are a fixed number of base classes, for which a large number of training examples are available, and then there are novel classes, for which a limited number of training examples are available. The classifier is then evaluated based on its ability to correctly classify both the base and novel classes. Our data augmentation method creates new examples for the novel classes in the following way: The features of each base class are grouped into clusters using K-means clustering. The difference between two clusters can be considered a transformation, such as a front-facing image to a side-facing image. All transformations are mined from all base classes and used to train a generator. Then, for each image in the novel class, a set of transformations are applied on the base class to generate new examples. We compare to a baseline where images are generated by naive approaches such as jittering, and achieve a significant improvement.

Mar 2018 - Apr 2018

Visual Place Recognition

report

As a course project for Computer Vision, I worked on automatically identifying the location of a place given only an image and no other metadata. We compared three machine learning models which use different kinds of features and evaluate their suitability for this task. The first approach uses a color histogram to represent the image; the second approach uses the GIST global descriptor as image features; the third approach uses raw images with a convolutional neural network without explicitly extracting features. We used the Google Street View dataset, which contains 62058 images from three cities: Pittsburgh, Orlando, and New York.

Oct 2017 - Dec 2017

The Sound of Sirens

This was a project that I worked on over 36 hours at HackUMass 2017 along with some cool undergrads that I met at the venue. We built a signaling system that could alert hearing impaired drivers if a vehicle with sirens is in the vicinity. To detect sirens, we trained a neural network on the UrbanSounds dataset, by extracting features such as the short-time Fourier transform, mel spectogram, and contrast. Our model achieved a test time accuracy of over 91%. We then interfaced with a Myo wristband using the Myo-SDK to make the wristband vibrate every time a siren was detected.

Nov 2017 - Nov 2017

CEGAR-based tool for specifying system properties

I worked on this project in collaboration with the Institute of Mathematical Sciences, Chennai, advised by Prof. Ramanujam and Prof. Sheerazuddin. Our objective was to build a tool to make model checking more accessible to software designers. Model checking is a technique for formal verification of concurrent or distributed systems, to ensure that the system meets specifications such as safety and liveness. Automated model checkers like NuSMV and SPIN check if the program satisfies the specified properties by taking in a description of the program and the properties. However, the properties typically have to be specified as formulae in Linear Temporal Logic (LTL), which makes it hard to do model checking at the design stage.

To solve this problem, we built a tool based on the Counter-Example Guided Abstraction Refinement mechanism that generates LTL formulae when the property is specified in the form of a sequence diagram. This is done iteratively to help the user specify the property correctly while concurrently identifying problem areas in the system. That is, the initial 'draft' of the formula is fed to the model checker along with the system description. If the property is violated, the model checker generates a trace, which is a sequence of states and actions where the system violates the property, either because of a bug, or because the property was incorrectly specified. To identify if it's the latter, we convert the trace back to a diagram which intuitively shows the possible sequence of messages that caused a failure. In this manner, the user can iteratively improve the system and the specification.

Jan 2017 - May 2017

Detecting variability in multi-word expressions

As a research intern at the Computational Linguistics Lab of Nara Institute of Science and Technology, Japan, I worked on multi-word expressions (MWE) with Professor Yuji Matsumoto. MWEs are made up of two or more words but tend to act as a single lexical unit, such as by the way. While the above expression is fixed, and always occurs exactly in one form, some MWEs may be flexible and thus harder to deal with, such as under the circumstances occurring as under the specific circumstances. My project was to automatically detect flexible-type multi word expressions (MWE) in English, and compile them into a dictionary to enable MWE-aware POS-tagging.

Starting with a candidate list of over 2600 MWEs, I implemented a rule based system to detect occurrences of each MWE and its possible variations in the LDC GigaWord corpus. The initial rules were as follows: allow up to two intervening words at all positions in the MWE (for the MWE a number of, also look for a large number of and a very large number of), allow interchangeable articles and pronouns (apple of his/her/their eye), allow plural forms of nouns, and tense variations in verbs, etc. I then counted the usage of the original MWE in comparison with its modified versions and imposed a threshold to classify each as fixed or flexible, as well as to prune rules that didn't work well.

Jun 2016 - Jul 2016

Sentiment analysis for foreign exchange trading

As a data science intern at Serendio Inc., I worked on developing a machine learning model to gauge expert opinion on trends in currency exchange using sentiment analysis. The task was to predict whether the sentiment about a specific currency pair, such as USD/EUR was bullish (favors buying) or bearish (favors selling) based on posts on financial forums such as Bloomberg, moneycontrol, etc. To gather training data, we scraped posts from StockTwits, a forum on which users post their opinion on stock market trends and tag them as bullish or bearish. We then built an ensemble of multiple machine learning models to do sentiment classification. Specifically, I implemented a Naive-Bayes classifier which achieved an accuracy of over 94% on the validation set.

Jan 2016 - Feb 2016