17  Alternative Methods

Abstract
TODO (150-200 WORDS)
Major changes expected!

This page is a work in progress and major changes will be made over time.

This survey has not exhaustively covered all machine learning models and entire model classes have been omitted; this short section briefly discusses these classes.

Bayesian Models

In terms of accessibility, many more off-shelf survival model implementations exist in the frequentist framework. Despite this, there is good evidence that Bayesian survival models, such as Bayesian neural networks (Bakker et al. 2004; Faraggi et al. 1997), can perform well (Bishop 2006) and a survey of these models may be explored in future work.

Gaussian Processes

Gaussian Processes (GPs) are a class of model that naturally fit the survival paradigm as they model the joint distribution of random variables over some continuous domain, often time. The simplest extension from a standard Cox model to GP is given by the non-linear hazard \[ h(\tau|X_i) = h_0(\tau)\phi(g(\tau|X_i)); \quad g(\cdot) \sim \mathcal{G}\mathcal{P}(0, k) \] where \(\phi\) is a non-negative link function, \(\mathcal{G}\mathcal{P}\) is a Gaussian process (Rasmussen and Williams 2004), and \(k\) is a kernel function with parameters to be estimated (Kim and Pavlovic 2018). Hyper-parameters are learnt by evaluating the likelihood function (Bishop 2006) and in the context of survival analysis this is commonly performed by assuming an inhomogeneous Poisson process (Fernández, Rivera, and Teh 2016; Saul 2016; Vehtari and Joensuu 2013). For a comprehensive survey of GPs for survival, see Saul (2016) (Saul 2016). There is evidence of GPs outperforming Cox and ML models (Fernández, Rivera, and Teh 2016). GPs are excluded from this survey due to lack of implementation (thus accessibility) and poorer transparency. Future research could look at increasing off-shelf accessibility of these models.

Non-Supervised Learning

As well as pure supervised learning, there are also survival models that use active learning (Nezhad et al. 2019), transfer learning, or treat survival analysis as a Markov process. As with GPs, none of these are currently available off-shelf and all require expert knowledge to be useful. These are not discussed in detail here but a very brief introduction to the Markov Process (MP) set-up is provided to motivate further consideration for the area.

  1. visualises the survival set-up as a Markov chain. In each discrete time-point \(t_1,...,t_{K-1}\), an individual can either move to the next time-point (and therefore be alive at that time-point), or move to one of the absorbing states (‘Dead’ and ‘Censored’). The final time-point, \(t_K\), is never visited as an individual must be dead or censored at the end of a study, and hence are last seen alive at \(t_{K-1}\). In this set-up, data is assumed sequential and the time of death or censoring is determined by the last state at which the individual was seen to be alive, plus one, i.e. if an individual transitions from \(t_k\) to ‘Death’, then they died at \(t_{k+1}\). This setting assumes the Markov property, so that the probability of moving to the ‘next’ state only depends on the current one. This method lends itself naturally to competing risks, which would extend the ‘Dead’ state to multiple absorbing states for each risk. Additionally, left-censoring can be naturally incorporated without further assumptions (Abner, Charnigo, and Kryscio 2013).

This set-up has been considered in survival both for Markov models and in the context of reinforcement learning (Data Study Group Team 2020), though the latter case is underdeveloped and future research could pursue this further.