6 Survival Task
This page is a work in progress and minor changes will be made over time.
This final section of this part of the book, brings everything together and considers the different prediction types that might be of interest in the survival analysis context and introduces the notion of a survival task more formally. Throughout this chapter let \(\mathcal{X}\subseteq \mathbb{R}^{n \times p}\) be the feature space.
A general survival prediction problem is one in which (Section 3.1):
- a survival dataset, \(\mathcal{D}\), is split for training, \(\mathcal{D}_{train}\), and testing, \(\mathcal{D}_{test}\);
- a survival model is fit on \(\mathcal{D}_{train}\); and
- the model predicts a representation of the unknown true survival time, \(Y\), given \(\mathcal{D}_{test}\).
The process of fitting is model-dependent, and can range from non-parametric methods and maximum likelihood estimation of model parameters to machine learning approaches. The model fitting process is discussed on a high-level in Section 3.1 and concrete algorithms are discussed in Part III of this book. The different survival problems are separated by prediction types or prediction problems, which can also be thought of as predictions of different representations of \(Y\). We consider 4 commonly used prediction types:
- The relative risk of an individual experiencing an event: A single continuous ranking.
- The time until an event occurs: A single continuous value.
- The prognostic index for a model: A single continuous value.
- The survival distribution: A probability distribution.
The first three of these are referred to as deterministic as they predict a single value whereas the fourth is probabilistic and returns a full survival distribution. Definitions of these are expanded on below but first note that survival predictions differ from other fields in two respects:
- The observed data used for model training (observed times \(T\), status indicator \(\Delta\)) is different from the outcome of interest (event times \(Y\)). This differs from, say, standard regression in which the same object (a single continuous variable) is used for fitting and predicting.
- With the exception of the time-to-event prediction, all other prediction types do not predict the expectation \(\mathbb{E}(Y)\), which is often of interest, but some other (related) quantity.
Survival prediction problems must be clearly separated as they are inherently incompatible. For example, it is not meaningful to compare a relative risk prediction from one model to a survival distribution prediction of another. Whilst these prediction types are separated above, they can be viewed as special cases of each other. Both (1.) and (2.) may be viewed as variants of (3.); and (1.), (2.), and (3.) can all be derived from (4.); this is elaborated on below and discussed fully in Chapter 19.
6.1 Predicting Risks
This is a common survival problem and is defined as predicting a continuous rank for an individual’s relative risk of experiencing the event. For example, given three subjects, \(\{i,j,k\}\), a relative risk prediction may predict the risk of event as \(\{0.1, 0.5, 10\}\) respectively. From these predictions, the following types of conclusions can be drawn:
- Conclusions comparing subjects. For example, \(i\) is at the least risk; the risk of \(j\) is only slightly higher than that of \(i\) but the risk of \(k\) is considerably higher than \(j\); the corresponding ranks for \(i,j,k,\) are \(1,2,3\);
- Conclusions comparing risk groups. For example, thresholding the risks at \(1.0\) means that \(i\) and \(j\) are in a low-risk group whilst \(k\) is in a high-risk group.
Whilst many important conclusions can be drawn from these predictions, the values themselves have no meaning when not compared to other individuals. Interpretation of these rankings depends on the model class (for example, PH and AFT models have opposite interpretations, Chapter 13) and its parametrization or implementation in specific software. For some higher ranking implies higher risk whereas others may assume that higher ranking implies lower risk. In this book, a higher ranking will always imply a higher risk of event (as in the example above).
Predicting rankings is the primary form of the survival ranking task, defined by predicting a continuous value, \(g: \mathcal{X}\rightarrow \mathcal{R}\) where \(\mathcal{R}\subseteq \mathbb{R}\).
6.2 Predicting Survival Times
Predicting a time to event is the problem of predicting the expectation \(\hat{y}=\mathbb{E}(Y|\mathbf{x})\). A time-to-event prediction is a special case of a ranking prediction as an individual with a longer survival time will have a lower overall risk: if \(\hat{y}_i,\hat{y}_j\) and \(\hat{r}_i,\hat{r}_j\) are survival time and ranking predictions for subjects \(i\) and \(j\) respectively, then \(\hat{y}_i > \hat{y}_j \Rightarrow \hat{r}_i < \hat{r}_j\).
For practical purposes, the expected time-to-event would be the ideal prediction type as it is easy to interpret and communicate. However, this type of prediction is rare for multiple reasons. For one, an usuall loss based on \(f(y_i)\) or some difference of true and predicted value, \(y_i-\hat{y}_i\) is not, suitable for censored data, as \(y_i\) is not observed for some observations, so direct estimation/prediction of \(\hat{y}_i = E(Y|\mathbf{x}_i)\) requires some imputation of censored observations (and evaluation on new data can also only be done on observed or imputed values).
Alternatively, one could derive the expectation by predicting the survival distribution while taking into account the censoring and obtain a time-to-event prediction by calculating expected values, but this brings its own challenges and pitfalls (see “Survival Distribution” below for details).
Predicting survival times is the deterministic survival task, defined by predicting a continuous value in the positive Reals and is specified by \(g: \mathcal{X}\rightarrow \mathbb{R}_{\geq 0}\). See Section 24.1 for practical discussion around predicting in \(\mathbb{R}_{\geq 0}\) vs. \(\mathbb{R}_{>0}\) and continuous vs discrete time representations. Formally, whilst this is a special case of the ranking task with \(\mathcal{R}\subseteq \mathbb{R}_{\geq 0}\), the distinction is important as a ‘deterministic’ prediction specifically refers to forecasting a single determined outcome with a meaningful interpretation, whereas the ‘ranking’ task is not a deterministic forecast of an event.
6.3 Prognostic Index Predictions
In medical terminology (which is often used in survival analysis), a prognostic index is a tool that predicts outcomes based on risk factors. Given covariates, \(\mathbf{X}\in \mathbb{R}^{n \times p}\), and coefficients, \(\boldsymbol{\beta}\in \mathbb{R}^p\), the linear predictor is defined as \(\boldsymbol{\eta}:= \mathbf{X}\boldsymbol{\beta}\). Applying some function \(g\), which could simply be the identity function \(g(x) = x\), yields a prognostic index, \(g(\boldsymbol{\eta})\). A prognostic index can serve several purposes, including:
- Scaling or normalization – simple functions to scale the linear predictor can better support interpretation and visualisation;
- Capturing non-linear effects – for example the Cox PH model (Chapter 13) applies the transformation \(g(\boldsymbol{\eta}) = \exp(\boldsymbol{\eta})\) to capture more complex relationships between features and outcomes;
- Aiding in interpretability – in some cases this could simply be \(g(\boldsymbol{\eta}) = -\boldsymbol{\eta}\) to ensure the ‘higher value implies higher risk’ interpretation.
A prognostic index is a special case of the survival ranking task, assuming that there is a one-to-one mapping between the prediction and expected survival times. Once again, it is assumed in this book that a higher value for the prognostic index implies higher risk of event.
6.4 Predicting Distributions
Predicting a survival distribution refers specifically to predicting the distribution of a subject’s survival time, i.e., modelling the distribution of the event occurring over \(\mathbb{R}_{\geq 0}\). Therefore, this is seen as the probabilistic analogue to the deterministic time-to-event prediction.
Distributional prediction can, in theory, target any of the quantities introduced in Section 4.1, but predicting \(S(t)\) and/or \(h(t)\) is most common. Hazard based approaches are particularly relevant for non- and semi-parametric estimation of the distribution, where no (or few) assumptions are made about the underlying distribution of event times.
As mentioned above, all prediction types can theoretically be derived from a survival distribution prediction. For example, a time-to-event prediction can be obtained via \(E(Y|\mathbf{x}) = \int_0^\infty \hat{S}(t)\). However, for non-parametric methods the estimated cdf is often improper in the presense of censoring and thus integration requires extrapolation of the cdf (Sonabend, Bender, and Vollmer 2022). For parametric models, the distribution of event times is fully specified once the paramers of the assumed distribution have been estimated, however, if the parameters were estimated based on only a small subset of the possible domain of \(Y\), this essentially still constitutes extrapolation and will in most cases yield implausible predictions. A popular alternative is therefore to estimate the restricted mean survival time (RMST; Han and Jung (2022); Andersen, Hansen, and Klein (2004)).
Predicting survival distributions is a type of probabilistic survival task, defined by predicting a conditional distribution over the positive Reals, \(g: \mathcal{X}\rightarrow \mathcal{S}\) where \(\mathcal{S}\subseteq \operatorname{Distr}(\mathbb{R}_{\geq 0})\) is a convex set of distributions on \(\mathbb{R}_{\geq 0}\).