24 Conclusions
We are working on this page and it will be available soon!
24.1 Common problems in survival analysis
24.1.1 Data cleaning
Events at t=0
Throughout this book we have defined survival times taking values in the non-negative Reals (zero inclusive) \(\mathbb{R}_{\geq 0}\). In practice, model implementations assume time is over the positive Reals (zero exclusive). One must therefore consider how to deal with subjects that experience the outcome at \(0\). There is no established best practice for dealing with this case as the answer may be data-dependent. Possible choices include:
- Deleting all data where the outcome occurs at \(t=0\), this may be appropriate if it only happens in a small number of observations and therefore deletion is unlikely to bias predictions;
- Update the survival time to the next smallest observed survival time. For example, if the first observation to experience the event after \(t=0\) happens at \(t=0.1\), then set \(0.1\) as the survival time for any observation experiencing the event at \(t=0\). Note this method will not be appropriate when data is over a long period, for example if measuring time over years, then there could be a substantial difference between \(t=0\) and \(t=1\);
- Update the survival time to a very small value \(\epsilon\) that makes sense given the context of the data, e.g., \(\epsilon = 0.0001\).
Continuous v Discrete Time
We defined survival tasks throughout this book assuming continuous time predictions in \(\mathbb{R}_{\geq 0}\). In practice, many outcomes in survival analysis are recorded on a discrete scale, such as in medical statistics where outcomes are observed on a yearly, daily, monthly, hourly, etc. basis. Whilst discrete-time survival analysis exists for this purpose (Chapter 21), software implementations overwhelming use theory from the ’continuous-time setting. There has not been a lot of research into whether discrete-time methods outperform continuous-time methods when correctly applied to discrete data, however available experiments do not indicate that discrete methods outperform their continuous counterparts (Suresh, Severn, and Ghosh 2022). Therefore it is recommended to use available software implementations, even when data is recorded on a discrete scale.
24.1.2 Evaluation and prediction
- Which time points to make predictions for?