0

I am trying to follow this tutorial over here: https://rviews.rstudio.com/2017/09/25/survival-analysis-with-r/ (bottom of the page).

I have slightly modified the code for this tutorial and have plotted the "staircases" (i.e. "survival functions", in the below picture "red", "blue", "green") corresponding to 3 of the observations in the data:

 library(survival)
    library(dplyr)
    library(ranger)
    library(data.table)
library(ggplot2)
library(plotly)
    
a = na.omit(lung)
a$ID <- seq_along(a[,1])

r_fit <- ranger(Surv(time,status) ~ age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss, data = a, mtry = 4, 
importance = "permutation", splitrule = "extratrees", verbose = TRUE)

death_times <- r_fit$unique.death.times
surv_prob  <-data.frame(r_fit$survival)
avg_prob <- sapply(surv_prob, mean)

plot(r_fit$unique.death.times, r_fit$survival[1,], type = "l", ylim = c(0,1), col = "red", xlab = "Days", ylab = "survival", main = "Survival Curves")

new = a[1:3,]

pred <- predict(r_fit, new, type = 'response')$survival
pred <- data.table(pred)
colnames(pred) <- as.character(r_fit$unique.death.times)

plot(r_fit$unique.death.times, pred[1,], type = "l", col = "red")

lines(r_fit$unique.death.times, r_fit$survival[2,], type = "l", col = "green")

lines(r_fit$unique.death.times, r_fit$survival[3,], type = "l", col = "blue")

enter image description here

From here, I want to make the above plot as "interactive". I want to make so that when you move the mouse over one of the curves:

  1. The "properties" belonging to that curve (from object "a") hover (e.g. ID, age, sex, ph.ecog, etc.)

  2. In the same "hover box" from 1), also show the x-coordinate (r_fit$unique) and the y-coordinate (from "pred") for each position the mouse is hovering over (for a given curve)

My plan was first to take the "grob" object and convert it to a "ggplot" object, and then to convert the "ggplot" object into a "plotly" object:

 grob= plot(r_fit$unique.death.times, pred[1,], type = "l", col = "red")
basic_plot = ggpubr::as_ggplot(grob)

But when I try to inspect the "basic_plot", it shows up as "NULL".

 ggplot(f)
Error: `data` must be a data frame, or other object coercible by `fortify()`, not an S3 object with class gg/ggplot

Had this worked, I would have eventually converted the ggplot object to plotly:

plotly_plot = ggplotly(final_plot)

How to make this interactive plot?

I am trying to achieve something similar to this: https://plotly.com/python/v3/ipython-notebooks/survival-analysis-r-vs-python/ (towards the bottom of the page, plot with the title "lifespan of different tumor DNA profiles")

enter image description here

(Please Note: I am working with a computer that has no USB port or Internet connection, only R with a few pre-installed libraries... I do not have "ggplotify" or "survminer")

6
  • 1
    base plots do not work as objects like ggplot. You might need as.grob cran.r-project.org/web/packages/ggplotify/vignettes/…. Or have you tried making the plot in ggplot or plotly to begin with? Commented Dec 25, 2020 at 8:39
  • Unfortunately i don't have ggplotify on my work computer (no internet, no USB port) Commented Dec 25, 2020 at 8:42
  • Ah in that case perhaps build the plot in ggplot and convert with ggplotly (or directly in plotly) . I can't look at the moment from my phone but I suspect the issue is explained here stackoverflow.com/a/29583945/10142537. Maybe the grob=plot() returns NULL? Commented Dec 25, 2020 at 8:46
  • Ok, that was the problem, I've added an answer now. The ggplot code I've used is a basic example I can improve if you like. It would better fit the ggplot grammer for the data to be in a single dataframe (long data) and use one geom_line call rather than 3! Commented Dec 25, 2020 at 9:48
  • is it possible to change the answer so that : p <- ggplot(var1 = a$ID, var2 = a$age )+ geom_line(aes(x = r_fit$unique.death.times, y = t(pred[1,])), col = "red") + geom_line(aes(x = r_fit$unique.death.times, y = r_fit$survival[2,]), col = "green") + geom_line(aes(x = r_fit$unique.death.times, y = r_fit$survival[3,]), col = "blue") ggplotly(p, tooltip = c( "var1", "var2")) Commented Dec 25, 2020 at 20:08

1 Answer 1

2

The issue is that when you draw a plot in base graphics draw directly on a device. The line of your code grob= plot(r_fit$unique.death.times, pred[1,], type = "l", col = "red") creates a NULL object (unlike ggplot which would return a plot object).

You can make the plot directly in ggplot (there are a few ways of doing this but I've done a simple example bolow) and convert it with ggplotly:

fig_dat <- data.frame(time = r_fit$unique.death.times,
                      pred_1 = t(pred[1,]),
                      fit_1 = r_fit$survival[2,],
                      fit_2 = r_fit$survival[3,])

fig_dat_long <- fig_dat %>% pivot_longer(-time, names_to = "pred_fit", values_to = "pred_fit_values")

gg_p <- ggplot(fig_dat_long, aes(x = time, y = pred_fit_values, colour = pred_fit)) +
  geom_line()

ggplotly(gg_p)

Alternatively you could also plot in plotly directly:

fig_dat <- data.frame(time = r_fit$unique.death.times,
                      pred_1 = t(pred[1,]),
                      fit_1 <- r_fit$survival[2,],
                      fit_2 <- r_fit$survival[3,])


fig <- plot_ly(fig_dat, x = ~time, y = ~pred_1, name = 'pred1', type = 'scatter', mode = 'lines')
fig <- fig %>% add_trace(y = ~fit_1, name = 'fit 1', mode = 'lines') 
fig <- fig %>% add_trace(y = ~fit_2, name = 'fit 2', mode = 'lines')

## make dataframe of variables to plot:
fig_dat <- data.frame(time = r_fit$unique.death.times,
                      pred_1 = t(pred[1,]),
                      fit_1 = r_fit$survival[2,],
                      fit_2 = r_fit$survival[3,])

# to include the variables from 'a' we need to put them in the same dataframe for plotting
# Trouble is they are different lengths the predicted data are a little shorter
dim(fig_dat)
dim(a)
# We can join the two with inner join: https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right
fig_dat_join <- inner_join(fig_dat, a, by = "time")
dim(fig_dat_join)
# now they are equal dimensions and joined together but we have a slight issue with duplicate values:
sort(a$time) # we can see here that time 53 appears twice for example
a$time[duplicated(a$time)] # this tells us which values in time are duplicated
sort(death_times) # some
death_times[duplicated(death_times)] #none
# because of the duplicates some combinations are returned: see rows 9 and 10
fig_dat_join 

# I'm not familiar with the analysis so I'm not sure what the correct way in this case is to remove the duplicates in 'a' so that the dimentions of 'a' match the output of 'r-fit'
# You might need to look into that but it might not make much difference to the visualisation

# I've not used plotly a great deal so there is probably a better way of doing this but I've done my best and included the links as comments: https://plotly-r.com/overview.html
# labels: https://plotly.com/r/figure-labels/
x_labs <- list(
  title = "Time")

y_labs <- list(
  title = "y axis")

# T include extra info in hovertext: I https://stackoverflow.com/questions/49901771/how-to-set-different-text-and-hoverinfo-text

p1 <- plot_ly(data = fig_dat_join,
              x = ~time,
              # text = ~n,
              # textposition = "auto",
              # hoverinfo = "text",
              hovertext = paste("Time :", fig_dat_join$time,
                                "<br> Sex :", fig_dat_join$sex,
                                "<br> Inst :", fig_dat_join$inst,
                                "<br> ID :", fig_dat_join$ID,
                                "<br> Age :", fig_dat_join$age
                                )) %>% 
  add_trace(y = ~pred_1,
            type = 'scatter',
            name = 'Predictor 1',
            mode = 'lines') %>% 
  add_trace( y = ~fit_1,
            type = 'scatter',
            name = 'Fit 1',
            mode = 'lines') %>% 
  add_trace( y = ~fit_2,
             type = 'scatter',
             name = 'Fit 2',
             mode = 'lines') %>% 
  layout(xaxis = x_labs, yaxis = y_labs)

p1


I was making dataframe a match up with the unique.death.times by using left_join() above. If you don't need that then we could just move the hovertext code into each add_trace?

fig_dat <- data.frame(time = r_fit$unique.death.times,
                      pred_1 = t(pred[1,]),
                      fit_1 = r_fit$survival[2,],
                      fit_2 = r_fit$survival[3,])


p2 <- plot_ly(data = fig_dat,
              x = ~time,
              # text = ~n,
              # textposition = "auto",
              hoverinfo = "text"
) %>% 
  add_trace(y = ~pred_1,
            type = 'scatter',
            name = 'Predictor 1',
            mode = 'lines',
            hovertext = paste("Time :", fig_dat$time,
                              "<br> y axis :", fig_dat$pred_1,
                              "<br> Sex :", a$sex[1],
                              "<br> Inst :", a$inst[1],
                              "<br> ID :", a$ID[1],
                              "<br> Age :", a$age[1]
            )) %>% 
  add_trace( y = ~fit_1,
             type = 'scatter',
             name = 'Fit 1',
             mode = 'lines',
             hovertext = paste("Time :", fig_dat$time,
                               "<br> y axis :", fig_dat$fit_1,
                               "<br> Sex :", a$sex[2],
                               "<br> Inst :", a$inst[2],
                               "<br> ID :", a$ID[2],
                               "<br> Age :", a$age[2]
             )) %>% 
  add_trace( y = ~fit_2,
             type = 'scatter',
             name = 'Fit 2',
             mode = 'lines',
             hovertext = paste("Time :", fig_dat$time,
                               "<br> y axis :", fig_dat$fit_2,
                               "<br> Sex :", a$sex[3],
                               "<br> Inst :", a$inst[3],
                               "<br> ID :", a$ID[3],
                               "<br> Age :", a$age[3]
             )) %>% 
  layout(xaxis = x_labs, yaxis = y_labs)

p2

Sign up to request clarification or add additional context in comments.

9 Comments

Happy Christmas! Thank you so much - this is the best Christmas gift I could have asked for! Just 2 questions : 1) in the hover text, how do you change the title for "r_fit$survival" to "y-axis"? 2) To the hover text, can you also add information from the "a" object (e.g. age, sex, id, ph.ecog, etc)? Thanks, and Happy Christmas!
is it possible to change the answer so that : p <- ggplot(var1 = a$ID, var2 = a$age )+ geom_line(aes(x = r_fit$unique.death.times, y = t(pred[1,])), col = "red") + geom_line(aes(x = r_fit$unique.death.times, y = r_fit$survival[2,]), col = "green") + geom_line(aes(x = r_fit$unique.death.times, y = r_fit$survival[3,]), col = "blue") ggplotly(p, tooltip = c( "var1", "var2"))
I updated to include a answer that works for plotly. Just saw your comment for ggplot. I'll see if I can get to that too.
Ok, I've changed the ggplot code above. ggplot likes long data but after converting to with ggplotly() I'm not sure how to adjust the hovertext... Maybe go with the plotly code XD
Ah ok, I didn't get that part. I thought it was one value per time. In that case we can forget the join I think. Is the new update what you need? Each add_trace references a predictor or fit now and also references the first second or third index of dataframe a. You might want to double check some values to see if they match what you expect!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.