Cox Regression and Its Neural Cousin

Time-to-event data is one of those structures that sits unhappily inside the regression-and-classification toolkit because it isn’t really either. You’re asking when will this event happen — when will this patient die, when will this component fail, when will this customer churn — but for a substantial fraction of your data, the event simply hasn’t happened yet by the time the study ends. The observation is censored. You know the event hadn’t happened by time t, and that’s all.

A regression model that treats censored points as known event times underestimates survival. A model that throws out censored points throws out most of the data. The right answer is a methodology built specifically for this structure, which is what survival analysis is. Cox regression has been the workhorse since 1972. DeepSurv extends it with neural networks for the nonlinear part. This primer walks through both.

It’s monograph № 3 in the representation/methods series.

What it covers

Eight sections, several live figures, about fifteen minutes to read.

§1 — Why survival data is different. The censoring problem. Why this is a genuine statistical structure, not a quirk.

§2 — Censoring, made visible. Live figure showing right-censoring, left-truncation, and what each one does to a naive estimator.

§3 — The two fundamental functions. Survival function S(t) and hazard function h(t). Their definitions, their derivatives-of-each-other relationship, the geometric intuition.

§4 — Hazard meets survival. Live figure showing the integral relationship between hazard rate and survival probability.

§5 — Cox’s elegant trick. The proportional hazards assumption. Why it lets you fit a hazard model without ever specifying the baseline. The partial likelihood that makes the whole thing identifiable.

§6 — The proportional hazards assumption. When it holds, when it doesn’t. The Schoenfeld residual test. How to fix it (stratification, time-varying covariates) when it breaks.

§7 — DeepSurv. The neural extension. Replacing the linear combination of covariates with a neural network output, keeping the partial-likelihood loss. When this wins over Cox, when it doesn’t.

§8 — Concordance and other metrics. How to evaluate survival models when “accuracy” doesn’t apply. The C-index. Time-dependent AUC.

Read it

Open the primer →

← Back to Autonomy