Third week of the IBM internship

I’m working on an internal project that motivated me to look into survival analysis. Cool stuff, that. Essentially, you have a bunch of data about the lifetimes of the some objects and potentially related covariates, and from that data you’d like to estimate the functional relationship between the lifetimes and the covariates, in the form of the survival function \(P(T > x) = f(x, X_1, \ldots, X_n).\) It’s not clear that this is the best lens to approach the particular problem I’m looking at (churn)— for instance, you could think of this in machine learning or approximation theory terms rather than statistical terms—, but it’s certainly a very sensible approach.

I still have access to the online Springer books through Caltech, so I started to read Dynamic regression methods for survival analysis, which is more than a bit ambitious given my lack of statistical training. The first chapter’s quite inspiring though: I now want to learn about compensators and martingale central limit theorems. Apparently one can view a lot of statistical estimators as compensators (or something close), and the difference between the estimators and the actual quantities as a martingale (or, more generally, a zero-mean stochastic process). Then the asymptotic error of the estimator can be understood through the central limit theorem (\emph{or}, in the more general case, through empirical process theory— which for various reasons, including the connection to sums of independent r.v.s in Banach spaces, I’m also very interested in learning).

Of course, my first question is: is there a nonasymptotic version of the martingale central limit theorem? And my second question is: do these concepts make sense in the matrix case (whatever that is), and what’s known there? Is this something worth working on? Unfortunately, I know nothing about statistics … I think maybe there’s a relation to the self-normalizing processes that Joel mentioned I should look into. So there’s one more thing to add to my list of too-little-time investigations.

Links to interesting literature:

Pollard’s notes on empirical process theory (the most concrete/approachable I can find)
Sara van de Geer’s notes on empirical process theory (concise and simple)
An introduction to survival analysis