1

I have a data frame of 379838 rows and 13 variables in columns( 13 clinical samples) :

 >  str( df)
'data.frame':   379838 obs. of  13 variables:
  $ V1 : num  0.8146 0.7433 0.0174 0.177 0 ...
 $ V2 : num  0.7465 0.5833 0.0848 0.5899 0.0161 ...
 $ V3 : num  0.788 0.843 0.333 0.801 0.156 ...
 $ V4 : num  0.601 0.958 0.319 0.807 0.429 ...
 $ V5 : num  0.792 0.49 0.341 0.865 1 ...
 $ V6 : num  0.676 0.801 0.229 0.822 0.282 ...
 $ V7 : num  0.783 0.732 0.223 0.653 0.507 ...
 $ V8 : num  0.69 0.773 0.108 0.69 0.16 ...
 $ V9 : num  0.4014 0.5959 0.0551 0.7578 0.2784 ...
 $ V10: num  0.703 0.784 0.131 0.698 0.204 ...
 $ V11: num  0.6731 0.8224 0.125 0.6021 0.0772 ...
 $ V12: num  0.7889 0.7907 0.0881 0.7175 0.2392 ...
 $ V13: num  0.6731 0.8221 0.0341 0.4059 0 ...

and I am trying to make a ggplot2 box plot grouping variables into three groups: V1-V5 , V6-V9 and V10-V13 and assigning different color to variables of each group.

I am trying the following code:

    df1= as.vector(df[, c("V1", "V2", "V3","V4", "V5")])
    df2= as.vector(df[, c("V6","V7", "V8","V9")])
    df3=as.vector(df[, c( "V10","V11", "V12","V13")])
    sample= c(df1,df2,df3)

   library(reshape2)

  meltData1 <- melt(df, varnames="sample")

  str(meltData1)
 'data.frame':  4937894 obs. of  2 variables:
  $ variable: Factor w/ 13 levels "V1","V2","V3",..: 1 1 1 1 1 1 1 1 1 1 ...
  $ value   : num  0.8146 0.7433 0.0174 0.177 0 ...

   p=ggplot(data=meltData1,aes(variable,value, fill=x$sample))
   p+geom_boxplot()

That gives me white box plots. How can I assign a colour to three groups of variables? Many thanks in advance!

1
  • Welcome to SO ! It could be useful to add a sample of your data in your question. You may use dput(head(df)) for this, for example. Commented Feb 13, 2013 at 17:38

2 Answers 2

3

As sample data were not provided, made new data frame containing 13 columns with names from V1 to V13.

df<-as.data.frame(matrix(rnorm(1300),ncol=13))

With function melt() from library reshape2 data are transformed from wide to long format. Now data frame has two columns: variable and value.

library(reshape2)
dflong<-melt(df)

To the long format new column sample is added. Here I repeated names group1, group2, group3 according to number of row in original data frame and number of original columns in each group.

dflong$sample<-c(rep("group1",nrow(df)*5),rep("group2",nrow(df)*4),rep("group3",nrow(df)*4))

New column is used with argument fill= to set colors according to grouping.

library(ggplot2)
ggplot(data=dflong,aes(variable,value, fill=sample))+geom_boxplot()

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

(+1) If you change data.frame(.) to as.data.frame(.) in the first line, you don't have to set the column names using colnames.
@Arun (+1) didn't know that difference between data.frame() and as.data.frame().
sure, np. If you type at R terminal as.data.frame.matrix you'll see that names(value) <- paste0("V", ic) is explicitly set already. But if you type data.frame, then you'll see that the row.names is copied back (Just to tell the reason).
It worked only with data.matrix instead of matrix(x). Thank you very much!
2

This is a follow-up to Didzis Elferts.

Objective: Split the sample into 3 colour groups with a difference in shade within the colour group.

The first part of the code is the same:

df<-as.data.frame(matrix(rnorm(1300),ncol=13))
library(reshape2)
dflong<-melt(df)
dflong$sample<-c(rep("group1",nrow(df)*5),rep("group2",nrow(df)*4),rep("group3",nrow(df)*4))
library(ggplot2)

Now, use the package RColorBrewer to select color shades

library(RColorBrewer)

Create a list of colors by color class

col.g <- c(brewer.pal(9,"Greens"))[5:9] # select 5 colors from class Greens
col.r <- c(brewer.pal(9,"Reds"))[6:9] # select 4 colors from class Reds
col.b <- c(brewer.pal(9,"Blues"))[6:9] # select 4 colors from class Blues
my.cols <- c(col.g,col.r,col.b)

Take a look at the colors selected:

image(1:13,1,as.matrix(1:13), col=my.cols, xlab="my palette", ylab="", xaxt="n", yaxt="n", bty="n")

And now plot with the colors we have created

ggplot(data=dflong,aes(variable,value,colour=variable))+geom_boxplot()+scale_colour_manual(values = my.cols)

In the above, with the colour and scale_colour_manual commands, only the lines are colored. Below, we use fill and scale_fill_manual:

   ggplot(data=dflong,aes(variable,value,fill=variable))+geom_boxplot()+scale_fill_manual(values = my.cols)

Here's an example of what I'm looking for

P.S. I'm a total newbie and learning R myself. I saw this question as an opportunity to apply something I just learned.

1 Comment

Great, Patrick, thank you very much!! My main headache was grouping variables with rep command...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.