R - Multiple data points in forest plot using ggplot2

Question

Example data:

df <- data.frame(Mean1=c(12,15,17,14,16,18,16,14),Lower1=c(8,11,13,7,15,12,12,11),Upper1=c(16,18,21,21,17,24,20,17),Mean2=c(13,16,18,15,17,19,17,15),Lower2=c(9,12,14,8,16,13,13,12),Upper2=c(17,19,22,22,18,25,21,18))
rownames(df) <- c(1,2,3,4,5,6,7,8)

I can produce a forest plot with Mean1 Lower1 and Upper1 from df:

ggplot(df, aes(y = row.names(df), x = df$Mean1)) +
     geom_point(size = 4) +
     geom_errorbarh(aes(xmax = df$Upper1, xmin = df$Lower1))

So my question is: How can I include Mean2 Lower2 and Upper2 from df to the plot so that both means from each observation point (rows) are represented as pairs with their respective error bars? So the output would be a similar forest plot, but with both means and error limits from each observation points displayed in pairs. I hope this makes sense.

I haven't tried anything because I simply don't know where to start.

I this possible to perform without disrupting the structure of the data frame?

The easiest solution is to reshape the dataframe to a long format, so that each error bar has a row with lower, upper, estimate and a grouping variable. Why do you need the structure intact? — Heroka
– Heroka, Commented Aug 26, 2015 at 9:09

cito · Accepted Answer · 2015-08-26 09:57:57Z

The most natural way to do it is to use position argument, but it needs values grouped with variable, not column names. You can add it inplace:

ggplot(df,aes(x= rep(rownames(df), 2),
       y= c(Mean1,Mean2),
       group=rep(c(1,2), each=nrow(df)))) +
geom_point(position=position_dodge(1))+coord_flip()

But more proper way is to disrupt the structure of the data frame, it will make code more cleaner:

ggplot(df, aes(x = rownames, 
           y = Mean, 
           group=groups)) +
geom_point(size = 4, position=position_dodge(1))+
geom_errorbar(aes(ymax = Upper, ymin = Lower), position=position_dodge(1))+
coord_flip()

For this example I've made this data.frame transformation:

df <- data.frame(Mean=c(df$Mean1,df$Mean2),
                 Lower=c(df$Lower1,df$Lower2),
                 Upper=c(df$Upper1,df$Upper2),
                 groups=factor(rep(c(1,2), each=nrow(df))),
                 rownames=as.character(rep(rownames(df), 2)))

kasterma · Accepted Answer · 2015-08-26 09:21:37Z

0

I don't know how to do it without disrupting the structure of your data frame, but since your data frame is not tidy data I would recommend to change it anyway. Then I get the following that might answer your question:

library(tidyr)
df$itemid <- rownames(df)
df <- gather(df, type, value, -itemid)
df <- separate(df, type, into=c("type", "grpid"), sep=-2)
df <- spread(df, type, value)

done in separate steps so it is easier to execute step by step to see what is happening. Then you can plot using:

library(ggplot2)
ggplot(df, aes(y = paste(itemid, grpid), x = df$Mean, color = grpid)) +
     geom_point(size = 4) +
     geom_errorbarh(aes(xmax = df$Upper, xmin = df$Lower))

answered Aug 26, 2015 at 9:21

kasterma

4,5091 gold badge23 silver badges28 bronze badges

Comments

scoa · Accepted Answer · 2015-08-26 10:53:10Z

0

I am not sure what you mean but do you want to plot the Mean2 values on top of the forest plot? In that case you can assign the first plot a value, lets say s1 and then add the new data to it like this (maybe add diff colors):

s1<-ggplot(df, aes(y = row.names(df), x = df$Mean1)) +
     geom_point(size = 4) +
     geom_errorbarh(aes(xmax = df$Upper1, xmin = df$Lower1))

s1 + geom_point(data=df, aes(y = row.names(df), x = df$Mean2)) + 
  geom_errorbarh(aes(xmax = df$Upper2, xmin = df$Lower2))

Otherwise you can restructure the data and then add facet_grid(. ~ Sample) to make seperate graphs for your samples (Mean1 and Mean2)

edited Aug 26, 2015 at 10:53

scoa

20k6 gold badges72 silver badges82 bronze badges

answered Aug 26, 2015 at 9:12

timfaber

2,0801 gold badge15 silver badges17 bronze badges

4 Comments

Olli J Over a year ago

Thank you. This is the idea I was looking for. Do you know is it possible to display those values so that the items in each pair are lined underneath each other?

Heroka Over a year ago

You could use y = as.numeric(row.names(df))-0.2 in the aes of the second part. But to repeat myself, why do you need the structure of the dataframe intact? It's not the best of practice.

Olli J Over a year ago

I'm really unexperienced when it comes to R. The data I was given was composed same as the example and I initially thought I could go on without tampering it any further. I understand now it is wise to reshape it and save me the trouble of using weird data frames.

Heroka Over a year ago

Good luck! It gets easier with practice, I promise.

Collectives™ on Stack Overflow

R - Multiple data points in forest plot using ggplot2

3 Answers 3

Comments

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related