Research

I’m currently working on a diverse set of machine learning projects, including projects related to international trade and the analysis of weather data. Various other projects I’ve worked on are listed below, organized by application area. My PhD dissertation may be found here.

Machine learning for neuroscience

Recent advances in 3D microscopy allow for recording the neurons in freely-moving C. elegans at high frame rates. In order to be able to study the calcium activity in the neurons, it is necessary to be able to track the individual neurons from frame to frame. Most automated methods proposed in the literature for tracking neurons focus on immobilized or partially-immobilized worms and fail to generalize to the freely-moving-worm setting. In this work we propose methods for automated tracking of neurons in freely-moving worms in the setting where a subset of neurons are marked with red fluorescence.

Statistical machine learning for any domain

Many papers have focused on achieving the best possible performance in a common domain-specific semi-supervised learning task. In this work we instead introduce approaches to end-to-end learning that allow one to jointly learn feature representations from unlabeled data (with or without labeled data) and learn the labels in a domain-agnostic manner. The proposed approaches can be used on any amount of labeled and unlabeled data, gracefully adjusting to the amount of supervision.

Statistical machine learning for computer vision

ConvNets are typically viewed as different in essence from kernel-based methods. In this work we show that this is unfounded, both formally and empirically. After providing a systematic framework to translate between ConvNets and convolutional kernel networks (CKNs), we demonstrate that ConvNets and their translations into their more principled counterparts, CKNs, perform similarly on landmark computer vision tasks.

Statistical machine learning for marine ecology

Oceanographers at the University of Washington are studying phytoplankton, organisms that are responsible for 50% of the total photosynthesis on Earth. We developed a change-point algorithm that may one day be embedded on board their research vessels to help them adapt their models and data collection in real time. The algorithm will also be used in retrospective analyses. Our work was featured on the UW eScience website.

Statistical analysis of meteorological models

The effects of a weather model’s parameters on the model’s forecasts are not well-understood. In this work we performed a sensitivity analysis of the parameters of the COAMPS weather model on the spatial structure of its forecasts. Ideally, we would eventually be able to tune the parameters to improve forecasts.

Statistical machine learning for healthcare

Sequences of medial billing codes for millions of patients contain a wealth of information about diseases and people’s behaviors. After demonstrating that canonical correlation analysis (CCA) discovers interesting relationships between codes, we explore the effectiveness of using CCA features in predicting elective surgery for diverticulitis.

Machine learning for cyber security

The STUCCO project at Oak Ridge National Lab aims to synthesize data from a variety of sources to help security analysts quickly learn about potential flaws in their systems.

Statistical analysis of international trade

Economic models of multi-product firms have historically mainly focused on how many goods firms produce, rather than what they produce. Recently we developed a model that accounts for correlations in production efficiencies across products to determine what firms co-export, and we applied the data to Chinese export data.

I encountered the same Chinese exports dataset when I was an undergraduate. My undergraduate thesis examined the factors that contribute to the extent of exchange rate pass-through for goods exported by Chinese firms. My advisor was Kala Krishna.