## My podcast masterlist

Here’s an early Christmas gift to you: a list of podcasts I enjoy! For listening while you’re doing all your holiday season travelling. APM: Marketplace KCRW’s Left, Right, and Center Newshour BBC World Update: Daily Commute Common Sense with Dan Carlin PRI’s The World: Latest Edition On the Media The Young Turks Video Podcast Citizen […]

## Mirror descent is, in a precise sense, a second order algorithm

For one of our projects at eBay, I’ve been attempting to do a Poisson MLE fit on a large enough dataset that Fisher scoring is not feasible. The problem is that the data also has such large variance in the scales of the observation that stochastic gradient descent does not work, period — because of […]

## Algebra: it matters

I’m looking at two different models for learning polynomial functions, and trying to determine if they are equivalent. After a couple days of thinking, I’ve reduced the question to the following: Can every symmetric polynomial of degree $$r$$ in $$d$$ variables that has no constant term be written as a sum of the $$r$$-th powers […]

## Julia, once more

Julia + PyCall + CCall + Gadfly or PyPlot (+ Julia Studio ?) looks delicious. The only feature that absolutely needs to be added is shared memory parallelism (why wasn’t this an initial core feature of the language?), but I’m extremely excited by the current awesomeness of the Julia ecosystem. I recommend you get into […]

## a bit on word embeddings

Lately I’ve been working almost exclusively on continuous word representations, with the goal of finding vectorial representations of words which expose semantic and/or syntactic relationships between words. As is typical for any interesting machine learning problem, there are a glut of clever models based on various assumptions (sparsity, hierarchical sparsity, low-rankedness, etc.) that yield respectable […]

## Installing Hadoop on Ubuntu (works for Ubuntu 12.04 and Hadoop 2.4.1)

I’m trying to use LDA on a large amount of data. A quick recap: Tried vowpal wabbit … it’s fast, I’ll give it that, but it’s also useless: the output is dubious (what I think are the topics look like they haven’t changed very much from the prior) *and* I have no idea how it […]

## Sharing numpy arrays between processes using multiprocessing and ctypes

Because of its global interpreter lock, Python doesn’t support multithreading. To me, this is a ridiculous limitation that should be gotten rid of post-haste: a programming language is not modern unless it support multithreading. Python supports multiprocessing, but the straightforward manner of using multiprocessing requires you to pass data between processes using pickling/unpickling rather than […]

## Eigenvector two-condition number for a product of PSD matrices

I’m pushing to submit a preprint on the Nystrom method that has been knocking around for the longest time. I find myself running into problems centering around expressions of the type $$B^{-1}A$$, where $$A, B$$ are SPSD matrices satisfying $$B \preceq A$$. This expression will be familiar to numerical linear algebraists: there $$B$$ would be […]

## Canonical Correlation Analysis (CCA)

I am not completely satisfied with the expositions of CCA that I’ve come across, so I decided to write one that reflects my own intuition. CCA is useful in the case where you observe two random variables that are both noisy linear functions of some underlying latent random variable, and you want to use this […]

## Decision time: MacPorts vs Homebrew vs Fink

My work macbook pro recently crapped out on me during an update of the OS (apparently something has a tendency to go wrong with the video card or its driver or something similar during this particular update for this particular model … sigh) so I’ve had the joy of reinstalling my personal ecosystem of software […]