Nystrom vs Random Feature Maps

I haven’t seen a truly convincing study comparing Nystrom approximations to Random Feature Map approximations. On the one hand, a NIPS 2012 paper compared the two and argued that because the bases Nystrom approximations use are adaptive to the problem, whereas those used by RFMs are not, Nystrom approximations are more efficient. This is an […]

My podcast masterlist

Here’s an early Christmas gift to you: a list of podcasts I enjoy! For listening while you’re doing all your holiday season travelling. APM: Marketplace KCRW’s Left, Right, and Center Newshour BBC World Update: Daily Commute Common Sense with Dan Carlin PRI’s The World: Latest Edition On the Media The Young Turks Video Podcast Citizen […]

Mirror descent is, in a precise sense, a second order algorithm

For one of our projects at eBay, I’ve been attempting to do a Poisson MLE fit on a large enough dataset that Fisher scoring is not feasible. The problem is that the data also has such large variance in the scales of the observation that stochastic gradient descent does not work, period — because of […]

Algebra: it matters

I’m looking at two different models for learning polynomial functions, and trying to determine if they are equivalent. After a couple days of thinking, I’ve reduced the question to the following: Can every symmetric polynomial of degree $$r$$ in $$d$$ variables that has no constant term be written as a sum of the $$r$$-th powers […]

Julia, once more

Julia + PyCall + CCall + Gadfly or PyPlot (+ Julia Studio ?) looks delicious. The only feature that absolutely needs to be added is shared memory parallelism (why wasn’t this an initial core feature of the language?), but I’m extremely excited by the current awesomeness of the Julia ecosystem. I recommend you get into […]

a bit on word embeddings

Lately I’ve been working almost exclusively on continuous word representations, with the goal of finding vectorial representations of words which expose semantic and/or syntactic relationships between words. As is typical for any interesting machine learning problem, there are a glut of clever models based on various assumptions (sparsity, hierarchical sparsity, low-rankedness, etc.) that yield respectable […]

I’m pushing to submit a preprint on the Nystrom method that has been knocking around for the longest time. I find myself running into problems centering around expressions of the type $$B^{-1}A$$, where $$A, B$$ are SPSD matrices satisfying $$B \preceq A$$. This expression will be familiar to numerical linear algebraists: there $$B$$ would be […]