4  Survival Analysis

Abstract
TODO (150-200 WORDS)
Major changes expected!

This page is a work in progress and major changes will be made over time.


TODO


In their broadest and most basic definitions, survival analysis is the study of temporal data from a given origin until the occurrence of one or more events or ‘end-points’ (Collett 2014), and machine learning is the study of models and algorithms that learn from data in order to make predictions or find patterns (Hastie, Tibshirani, and Friedman 2001). Reducing either field to these definitions is ill-advised.

This chapter collects terminology utilised in survival analysis (Section 4.1) and machine learning (Section 3.1) in order that this book can cleanly discuss ‘machine learning survival analysis’ (Section 4.4). Once the mathematical setting is set up, the book scope is fully presented in (Section 4.2). Whilst the content of this chapter is not novel with respect to either survival analysis or machine learning separately, this does appear to be the first formulation of the survival analysis machine learning ‘task’ (Király et al. 2021).

4.1 Survival Analysis

Survival analysis is the field of Statistics concerned with the analysis of time-to-event data, which consists of covariates, a categorical (often binary) outcome, and the time until this outcome takes place (the ‘survival time’). As a motivating example of time-to-event data, say 100 patients are admitted to a COVID-19 ward and for each patient the following covariate data are collected: age, weight and sex; additionally for each patient the time until death or discharge is recorded. In the time-to-event dataset, which takes a standard tabular form, each of the 100 patients is a row, with columns consisting of age, weight, and sex measurements, as well as the outcome (death or discharge) and the time to outcome.

Survival analysis is distinct from other areas of Statistics due to the incorporation of ‘censoring’, a mechanism for capturing uncertainty around when an event occurs in the real-world. Continuing the above example, if a patient dies of COVID-19 five dies after admittance, then their outcome is exactly known: they died after five days. Consider now a patient who is discharged after ten days. As death is a guaranteed event they have a true survival time but this may be decades later, therefore they are said to be censored at ten days. This is a convenient method to express that the patient survives up to ten days and their survival status at any time after this point is unknown. Censoring is a unique challenge to survival analysis that attempts to incorporate as much information as possible without knowing the true outcome. This is a ‘challenge’ as statistical models usually rely on learning from observed, i.e. known, outcome data; therefore censoring requires special treatment.

Whilst survival analysis occurs in many fields, for example as ‘reliability analysis’ in engineering and ‘duration analysis’ in economics, in this book the term ‘survival’ will always be used. Moreover the following terminology, analogous to a healthcare setting, are employed: survival analysis (or ‘survival’ for short) refers to the field of study; the event of interest is the ‘event’, or ‘death’; an observation that has not experienced an event is ‘censored’ or ‘alive’; and observations are referred to as ‘observations’, ‘subjects’, or ‘patients’.

Some of the biggest challenges in survival analysis stem from an unclear definition of a ‘survival analysis prediction’ and different (sometimes conflicting) common notations. This book attempts to make discussions around survival analysis clearer and more precise by first describing the mathematical setting for survival analysis in (Section 4.1.1) and only then defining the prediction types to consider in (Section 4.3).

4.1.1 Survival Data and Definitions

Survival analysis has a more complicated data setting than other fields as the ‘true’ data generating process is not directly modelled but instead engineered variables are defined to capture observed information. Let,

  • \(X \ t.v.i. \ \mathcal{X}\subseteq \mathbb{R}^p, p \in \mathbb{N}_{> 0}\) be the generative random variable representing the data features/covariates/independent variables.
  • \(Y \ t.v.i. \ \mathcal{T}\subseteq \mathbb{R}_{\geq 0}\) be the (unobservable) true survival time.
  • \(C \ t.v.i. \ \mathcal{T}\subseteq \mathbb{R}_{\geq 0}\) be the (unobservable) true censoring time.

It is impossible to fully observe both \(Y\) and \(C\). This is clear by example: if an observation drops out of a study then their censoring time is observed but their event time is not, whereas if an observation dies then their true censoring time is unknown. Hence, two engineered variables are defined to represent observable outcomes. Let,

  • \(T := \min\{Y,C\}\) be the observed outcome time.
  • \(\Delta := \mathbb{I}(Y = T) = \mathbb{I}(Y \leq C)\) be the survival indicator (also known as the censoring or event indicator).

Together \((T,\Delta)\) is referred to as the survival outcome or survival tuple and they form the dependent variables. The survival outcome provides a concise mechanism for representing the time of the observed outcome and indicating which outcome (death or censoring) took place.

Now the full generative template for survival analysis is given by \ \((X, \Delta, C, Y, T) \ t.v.i. \ \mathcal{X}\times \{0,1\}\times \mathcal{T}\times \mathcal{T}\times \mathcal{T}\) and with \((X_i, \Delta_i, C_i, Y_i, T_i)\) jointly i.i.d. A survival dataset is defined by \(\mathcal{D}= \{(X_1,T_1,\Delta_1),...,(X_n,T_n,\Delta_n)\}\) where \((X_i,T_i,\Delta_i) \stackrel{i.i.d.}\sim(X,T,\Delta)\) and \(X_i\) is a \(p\)-vector, \(X_i = (X_{i;1},...,X_{i;p})\). Though unobservable, the true outcome times are defined by \((Y_1,C_1),...,(Y_n,C_n)\) where \((Y_i,C_i) \stackrel{i.i.d.}\sim(Y,C)\).

  1. exemplifies a random survival dataset with \(n\) observations (rows) and \(p\) features.
Table 4.1: Theoretical time-to-event dataset. \((Y,C)\) are ‘hypothetical’ as they can never be directly observed. Rows are individual observations, \(X\) columns are features, \(T\) is observed time-to-event, \(\Delta\) is the censoring indicator, and \((Y,C)\) are hypothetical true survival and censoring times.
X X X T \(\Delta\) Y C
\(X_{11}\) \(\cdots\) \(X_{1p}\) \(T_1\) \(\Delta_1\) \(Y_1\) \(C_1\)
\(\vdots\) \(\ddots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)
\(X_{n1}\) \(\cdots\) \(X_{np}\) \(T_n\) \(\Delta_n\) \(Y_n\) \(C_n\)
  1. exemplifies an observed survival dataset with a modified version of the rats dataset (Therneau 2015).
Table 4.2: rats (Therneau 2015) time-to-event dataset with added hypothetical columns (\(Y,C\)). Rows are individual observations, \(X\) columns are features, \(T\) is observed time-to-event, \(\Delta\) is the censoring indicator, and \((Y,C)\) are hypothetical (here arbitrary values dependent on \((T,\Delta)\)) true survival and censoring times.
litter \((X_{.;1})\) rx \((X_{.;2})\) sexF \((X_{.;3})\) time (T) status (\(\Delta\)) survTime (Y) censTime (C)
1 1 1 101 0 105 101
1 0 1 49 1 49 55
1 0 1 104 0 200 104
2 1 0 91 0 92 91
2 0 0 104 1 104 104
2 0 0 102 1 102 120

Both datasets includes two extra columns, on the right of the triple vertical line, which imagine hypothetical data for the unobserved true survival and censoring times.

Finally the following terms are used frequently throughout this report. Let \((T_i, \Delta_i) \stackrel{i.i.d.}\sim(T,\Delta), i = 1,...,n\), be random survival outcomes. Then,

  • The set of unique or distinct time-points refers to the set of time-points in which at least one observation dies or is censored, \(\mathcal{U}_O := \{T_i\}_{i \in \{1,...,n\}}\).
  • The set of unique death times refers to the set of unique time-points in which death (and not censoring) occurred, \(\mathcal{U}_D := \{T_i : \Delta_i = 1\}_{i \in \{1,...,n\}}\).
  • The risk set at a given time-point, \(\tau\), is the set of subjects who are known to be alive (not dead or censored) just before that time, \(\mathcal{R}_\tau := \{i: T_i \geq \tau\}\) where \(i\) is a unique row/subject in the data.
  • The number of observations alive at \(\tau\) is the cardinality of the risk set, \(|\mathcal{R}_\tau|\), and is denoted by \(n_\tau := \sum_i \mathbb{I}(T_i \geq \tau)\).
  • The number of observations who die at \(\tau\) is denoted by \(d_\tau := \sum_i \mathbb{I}(T_i = \tau, \Delta_i = 1)\).
  • The Kaplan-Meier estimate of the average survival function of the training data survival distribution is the Kaplan-Meier estimator (Section 11.1.1) fit (Section 3.1.1) on training data \((T_i, \Delta_i)\) and is denoted by \(\hat{S}_{KM}\).
  • The Kaplan-Meier estimate of the average survival function of the training data censoring distribution is the Kaplan-Meier estimator fit on training data \((T_i, 1 - \Delta_i)\) and is denoted by \(\hat{G}_{KM}\).

Notation and definitions will be recapped at the start of each chapter for convenience.

4.1.2 Censoring

Censoring is now discussed in more detail and important concepts introduced. Given the survival generating process \((X,T,\Delta)\) with unobservable \((Y,C)\), the event is experienced if \(Y \leq C\) and \(\Delta = 1\) or censored if \(\Delta = 0\).

#### Censoring ‘Location’ {.unnumbered .unlisted}

Right-censoring is the most common form of censoring in survival models and it occurs either when a patient drops out (but doesn’t experience the event) of the study before the end and thus their outcome is unknown, or if they experience the event at some unknown point after the study end. Formally let \([\tau_l, \tau_u]\) be the study period for some, \(\tau_l,\tau_u \in \mathbb{R}_{\geq 0}\). Then right-censoring occurs when either \(Y > \tau_u\) or when \(Y \in [\tau_l,\tau_u]\) and \(C \leq Y\). In the first case \(T = C = \tau_u\) and censoring is due to the true time of death being unknown as the observation period has finished. In the latter case, a separate censoring event, such as drop-out or another competing risk, is observed.

Left-censoring is a rarer form of censoring and occurs when the event happens at some unknown time before the study start, \(Y < \tau_l\). Interval-censoring occurs when the event takes place in some interval within the study period, but the exact time of event is unknown. (Figure 4.1) shows a graphical representation of right-censoring.

TODO
Figure 4.1: Dead and censored subjects (y-axis) over time (x-axis). Black diamonds indicate true death times and white circles indicate censoring times. Vertical line is the study end time. Subjects 1 and 2 die in the study time. Subject 3 is censored in the study and (unknown) dies within the study time. Subject 4 is censored in the study and (unknown) dies after the study. Subject 5 dies after the end of the study.

Censoring ‘Dependence’

Censoring is often defined as uninformative if \(Y \perp \!\!\! \perp C\) and informative otherwise however these definitions can be misleading as the term ‘uninformative’ appears to be imply that censoring is independent of both \(X\) and \(Y\), and not just \(Y\). Instead the following more precise definitions are used in this report.

Definition 4.1 (Censoring) Let \((X,T,\Delta,Y,C)\) be defined as above, then

  • If \(C \perp \!\!\! \perp X\), censoring is feature-independent, otherwise censoring is feature-dependent.
  • If \(C \perp \!\!\! \perp Y\), then censoring is event-independent, otherwise censoring is event-dependent.
  • If \((C \perp \!\!\! \perp Y) | X\), censoring is conditionally independent of the event given covariates, or conditionally event-independent.
  • If \(C \perp \!\!\! \perp(X,Y)\) censoring is uninformative, otherwise censoring is informative.

Non-informative censoring can generally be well-handled by models as true underlying patterns can still be detected and the reason for censoring does not affect model inference or predictions. However in the real-world, censoring is rarely non-informative as reasons for drop-out or missingness in outcomes tend to be related to the study of interest. Event-dependent censoring is a tricky case that, if not handled appropriately (by a competing-risks framework), can easily lead to poor model development; the reason for this can be made clear by example: Say a study is interested in predicting the time between relapses of stroke but a patient suffers a brain aneurysm due to some separate neurological condition, then there is a high possibility that a stroke may have occurred if the aneurysm had not. Therefore a survival model is unlikely to distinguish the censoring event (aneurysm) from the event of interest (stroke) and will confuse predictions. In practice, the majority of models and measures assume that censoring is conditionally event-independent and hence censoring can be predicted by the covariates whilst not directly depending on the event. For example if studying the survival time of ill pregnant patients in hospital, then dropping out of the study due to pregnancy is clearly dependent on how many weeks pregnant the patient is when the study starts (for the sake of argument assume no early/late pregnancy due to illness).

Type I Censoring

Type I and Type II censoring are special-cases of right-censoring, only Type I is discussed in this book as it is more common in simulation experiments. Type I censoring occurs if a study has a set end-date, or maximum survival time, and a patient survives until the end of the study. If survival times are dependent on covariates (i.e. not random) and the study start date is known (or survival times are shifted to the same origin) then Type I censoring will usually be informative as censored patients will be those who survived the longest.

4.2 Book Scope

Now that the mathematical setting has been defined, the book scope is provided. For time and relevance the scope of this book is narrowed to the most parsimonious setting that is genuinely useful in modelling real-world scenarios. This is the setting that captures all assumptions made by the majority of proposed survival models and therefore is practical both theoretically and in application. This setting is defined by the following assumptions (with justifications):

  • Let \(p\) be the proportion of censored observations in the data, then \(p \in (0,1)\). This open interval prevents the case when \(p = 0\), which is simply a regression problem (Section 3.1.2.2), or the case when \(p = 1\), in which no useful models exist (as the event never occurs).
  • Only right-censoring is observed in the data, no left- or interval-censoring. This accurately reflects most real-world data in which observations that have experienced the event before the study start (left-censoring) are usually not of interest, and close monitoring of patients means that interval-censoring is unlikely in practice. It is acknowledged that left-truncation is a common problem in medical datasets though this is often handled not by models but by data pre-processing, which is not part of the workflow discussed in this book.
  • There is only one event of interest, an observation that does not experience this event is censored. This eliminates the ‘competing risk’ setting in which multiple events of interest can be modelled.
  • The event can happen at most once. For example the event could be death or initial diagnosis of a disease however cannot be recurrent such as seizure. In the case where the event could theoretically happen multiple times, only the time to one (usually the first) occurrence of the event is modelled.
  • The event is guaranteed to happen at least once. This is an assumption implicitly made by all survival models as predictions are for the time until the true event, \(Y\), and not the observed outcome, \(T\).

For both the multi-event and recurrent-event cases, simple reductions exist such that these settings can be handled by the models discussed in this paper however this is not discussed further here.

No assumptions are made about whether censoring is dependent on the data but when models and measures make these assumptions, they will be explicitly discussed.

The purpose of any statistical analysis is dependent on the research question. For example techniques are available for data analysis, imputation, exploration, prediction, and more. This book focuses on the predictive setting; other objectives, such as model inspection and data exploration can be achieved post-hoc via interpretable machine learning techniques (Molnar 2019).

Finally, the methods in this book are restricted to frequentist statistics. Bayesian methods are not discussed as the frequentist setting is usually more parsimonious and additionally there are comparatively very few off-shelf implementations of Bayesian survival methods. Despite this, it is noted that Bayesian methods are particularly relevant to the research in this book, which is primarily concerned with uncertainty estimates and predictions of distributions. Therefore, a natural extension to the work in this book would be to fully explore the Bayesian setting.

4.3 Survival Prediction Problems

This section continues by defining the survival problem narrowed to the scope described in the previous section. Defining a single ‘survival prediction problem’ (or ‘task’) is important mathematically as conflating survival problems could lead to confused interpretation and evaluation of models. Let \((X,T,\Delta)\) and \(\mathcal{D}\) be as defined above. A general survival prediction problem is one in which:

  • a survival dataset, \(\mathcal{D}\), is split (Section 3.1.1) for training, \(\mathcal{D}_{train}\), and testing, \(\mathcal{D}_{test}\);
  • a survival model is fit on \(\mathcal{D}_{train}\); and
  • the model predicts some representation of the unknown true survival time, \(Y\), given \(\mathcal{D}_{test}\).

The process of ‘fitting’ is model-dependent, and can range from simple maximum likelihood estimation of model coefficients, to complex algorithms. The model fitting process is discussed in more abstract detail in (Section 3.1) and then concrete algorithms are discussed in (?sec-review). The different survival problems are separated by ‘prediction types’ or ‘prediction problems’, these can also be thought of as predictions of different ‘representations’ of \(Y\). Four prediction types are discussed in this paper, these may be the only possible survival prediction types and are certainly the most common as identified in chapters (?sec-review) and (Chapter 5). They are predicting:

  • The relative risk of an individual experiencing an event – A single continuous ranking.
  • The time until an event occurs – A single continuous value.
  • The prognostic index for a model – A single continuous value.
  • An individual’s survival distribution – A probability distribution.

The first three of these are referred to as deterministic problems as they predict a single value whereas the fourth is probabilistic and returns a full survival distribution. Definitions of these are expanded on below.

Survival predictions differ from other fields in two respects. Firstly, the predicted outcome, \(Y\), is a different object than the outcome used for model training, \((T, \Delta)\). This differs from, say, regression in which the same object (a single continuous variable) is used for fitting and predicting. Secondly, with the exception of the time-to-event prediction, all other prediction types do not predict \(Y\) but some other related quantity.

Survival prediction problems must be clearly separated as they are inherently incompatible. For example it is not meaningful to compare a relative risk prediction from one observation to a survival distribution of another. Whilst these prediction types are separated above, they can be viewed as special cases of each other. Both (1) and (2) may be viewed as variants of (3); and (1), (2), and (3) can all be derived from (4); this is elaborated on below.

Relative Risk/Ranking

This is perhaps the most common survival problem and is defined as predicting a continuous rank for an individual’s ‘relative risk of experiencing the event’. For example, given three patients, \(\{i,j,k\}\), a relative risk prediction may predict the ‘risk of event’ as \(\{0.1, 0.5, 10\}\) respectively. From these predictions, the following types of conclusions can be drawn:

  • Conclusions comparing patients. e.g. \(i\) is at the least risk; the risk of \(j\) is only slightly higher than that of \(i\) but the risk of \(k\) is considerably higher than \(j\); the corresponding ranks for \(i,j,k,\) are \(1,2,3\).
  • Conclusions comparing risk groups. e.g. thresholding risks at \(1.0\) places \(i\) and \(j\) in a ‘low-risk’ group and \(k\) in a ‘high-risk’ group

So whilst many important conclusions can be drawn from these predictions, the values themselves have no meaning when not compared to other individuals. Interpretation of these rankings has historically been conflicting in implementation, with some software having the interpretation ‘higher ranking implies higher risk’ whereas others may indicate ‘higher ranking implies lower risk’ \(\ref{sec:tools_mlr3proba_api_learn}\). In this book, a higher ranking will always imply a higher risk of event (as in the example above).

Time to Event

Predicting a time to event is the problem of predicting the deterministic survival time of a patient, i.e. the amount of time for which they are predicted to be alive after some given start time. Part of the reason this problem is less common in survival analysis is because it borders regression – a single continuous value is predicted – and survival – the handling of censoring is required – but neither is designed to solve this problem directly. Time-to-event predictions can be seen as a special-case of the ranking problem as an individual with a predicted longer survival time will have a lower overall risk, i.e. if \(t_i,t_j\) and \(r_i,r_j\) are survival time and ranking predictions for patients \(i\) and \(j\) respectively, then \(t_i > t_j \rightarrow r_i < r_j\).

Prognostic Index

Given covariates, \(x \in \mathbb{R}^{n \times p}\), and a vector of model coefficients, \(\beta \in \mathbb{R}^p\), the linear predictor is defined by \(\eta := x\beta \in \mathbb{R}^n\). The ‘prognostic index’ is a term that is often used in survival analysis papers that usually refers to some transformation (possibly identity), \(\phi\), on the linear predictor, \(\phi(\eta)\). Assuming a predictive function (for survival time, risk, or distribution defining function (see below)) of the form \(g(\varphi)\phi(\eta)\), for some function \(g\) and variables \(\varphi\) where \(g(\varphi)\) is constant for all observations (e.g. Cox PH (Section 11.1.2)), then predictions of \(\eta\) are a special case of predicting a relative risk, as are predictions of \(\phi(\eta)\) if \(\phi\) is rank preserving. A higher prognostic index may imply a higher or lower risk of event, dependent on the model structure.

Survival Distribution

Predicting a survival distribution refers specifically to predicting the distribution of an individual patient’s survival time, i.e. modelling the distribution of the event occurring over \(\mathbb{R}_{\geq 0}\). Therefore this is seen as the probabilistic analogue to the deterministic time-to-event prediction, these definitions are motivated by similar terminology in machine learning regression problems (Section 3.1). The above three prediction types can all be derived from a probabilistic survival distribution prediction (Chapter 19).

A survival distribution is a mathematical object that is estimated by predicting a representation of the distribution. Let \(W\) be a continuous random variable t.v.i. \(\mathbb{R}_{\geq 0}\) with probability density function (pdf), \(f_W: \mathbb{R}_{\geq 0}\rightarrow \mathbb{R}_{\geq 0}\), and cumulative distribution function (cdf), \(F_W: \mathbb{R}_{\geq 0}\rightarrow [0,1]; (\tau) \mapsto P(W \leq \tau)\). The pdf, \(f_W(\tau)\), is the likelihood of an observation dying in a small interval around time \(\tau\), and \(F_W(\tau) = \int^\tau_0 f_W(\tau)\) is the probability of an observation being dead at time \(\tau\) (i.e. dying at or before \(\tau\)). In survival analysis, it is generally more interesting to model the risk of the event taking place or the probability of the patient being alive, leading to other distribution representations of interest.

The survival function is defined as \[ S_W: \mathbb{R}_{\geq 0}\rightarrow [0,1]; \quad (\tau) \mapsto P(W \geq \tau) = \int^\infty_\tau f_W(u) \ du \] and so \(S_W(\tau) = 1-F_W(\tau)\). This function is known as the survival function as it can be interpreted as the probability that a given individual survives until some point \(\tau \geq 0\).

Another common representation is the hazard function, \[ h_W: \mathbb{R}_{\geq 0}\rightarrow \mathbb{R}_{\geq 0}; \quad (\tau) \mapsto \frac{f_W(\tau)}{S_W(\tau)} \] The hazard function is interpreted as the instantaneous risk of death given that the observation has survived up until that point; note this is not a probability as \(h_W\) can be greater than one.

The cumulative hazard function (chf) can be derived from the hazard function by \[ H_W: \mathbb{R}_{\geq 0}\rightarrow \mathbb{R}_{\geq 0}; \quad (\tau) \mapsto \int^\tau_0 h_W(u) \ du \]

The cumulative hazard function relates simply to the survival function by \[ H_W(\tau) = \int^\tau_0 h_W(u) \ du = \int^\tau_0 \frac{f_W(u)}{S_W(u)} \ du = \int^\tau_0 -\frac{S'_W(u)}{S_W(u)} \ du = -\log(S_W(\tau)) \]

Any of these representations may be predicted conditionally on covariates for an individual by a probabilistic survival distribution prediction. Once a function has been estimated, predictions can be made conditional on the given data. For example if \(n\) survival functions are predicted, \(\hat{S}_1,...,\hat{S}_n\), then \(\hat{S}_i\) is interpreted as the predicted survival function given covariates of observation \(i\), and analogously for the other representation functions.

4.4 Survival Analysis Task

The survival prediction problems identified in (Section 4.3) are now formalised as machine learning tasks.

Survival Task

Box 4.1 Let \((X,T,\Delta)\) be random variables t.v.i. \(\mathcal{X}\times \mathcal{T}\times \{0,1\}\) where \(\mathcal{X}\subseteq \mathbb{R}^p\) and \(\mathcal{T}\subseteq \mathbb{R}_{\geq 0}\). Let \(\mathcal{S}\subseteq \ensuremath{\operatorname{Distr}(\mathcal{T})}\) be a convex set of distributions on \(\mathcal{T}\) and let \(\mathcal{R}\subseteq \mathbb{R}\). Then,

  • The probabilistic survival task is the problem of predicting a conditional distribution over the positive Reals and is specified by \(g: \mathcal{X}\rightarrow \mathcal{S}\).
  • The deterministic survival task is the problem of predicting a continuous value in the positive Reals and is specified by \(g: \mathcal{X}\rightarrow \mathcal{T}\).
  • The survival ranking task is specified by predicting a continuous ranking in the Reals and is specified by \(g: \mathcal{X}\rightarrow \mathcal{R}\).

The estimated prediction functional \(\hat{g}\) is fit on training data \\((X_1,T_1,\Delta_1),...,(X_n,T_n,\Delta_n) \stackrel{i.i.d.}\sim(X,T,\Delta)\) and is considered ‘good’ if \\(\mathbb{E}[L(T^*, \Delta^*, \hat{g}(X^*))]\) is low, where \((X^*, T^*, \Delta^*) \sim (X, T, \Delta)\) is independent of \((X_1,T_1,\Delta_1),...,(X_n,T_n,\Delta_n)\) and \(\hat{g}\).

Any other survival prediction type falls within one of these tasks above, for example predicting log-survival time is the deterministic task and predicting prognostic index or linear predictor is the ranking task. Removing the separation between the prognostic index and ranking prediction types is due to them both making predictions over the Reals; their mathematical difference lies in interpretation only. In general, the survival task will assume that \(\mathcal{T}\subseteq \mathbb{R}_{\geq 0}\), and the terms ‘discrete’ or ‘reduced survival task’ will refer to the case when \(\mathcal{T}\subseteq \mathbb{N}_{> 0}\). Unless otherwise specified, the ‘survival task’, will be used to refer to the probabilistic survival task.

Survival Analysis and Regression

Survival and regression tasks are closely related as can be observed from their respective definitions. Both are specified by \(g : \mathcal{X}\rightarrow \mathcal{S}\) where for probabilistic regression \(\mathcal{S}\subseteq \operatorname{Distr}(\mathbb{R})\) and for survival \(\mathcal{S}\subseteq \operatorname{Distr}(\mathbb{R}_{\geq 0})\). Furthermore both settings can be viewed to use the same generative process. In the survival setting in which there is no censoring then data is drawn from \((X,Y) \ t.v.i. \ \mathcal{X}\times \mathcal{T}, \mathcal{T}\subseteq \mathbb{R}_{\geq 0}\) and in regression from \((X,Y) \ t.v.i. \ \mathcal{X}\times \mathcal{Y}, \mathcal{Y}\subseteq \mathbb{R}\), so that the only difference is whether the outcome data ranges over the Reals or positive Reals.

These closely related tasks are discussed in more detail in (Chapter 19), with a particular focus on how the more popular regression setting can be used to solve survival tasks. In (?sec-review) the models are first introduced in a regression setting and then the adaptations to survival are discussed, which is natural when considering that historically machine learning survival models have been developed by adapting regression models.