0

I have troubles plotting a scatter plot for my data. I have 1 independent variable "Strain" for which I have 3 explanatory values. See structure dataframe

'data.frame':   30 obs. of  4 variables:
 $ Strain       : Factor w/ 30 levels "1","10","11",..: 1 12 14 15 25 27 28 29 30 2 ...
 $ second_hour  : Factor w/ 30 levels "10356.3888888889",..: 15 16 8 14 7 6 11 10 13 12 ...
 $ second_hour_n: Factor w/ 30 levels "10149.4751953184",..: 5 4 15 6 18 19 13 14 9 12 ...
 $ Beula        : num  21674 21308 19905 20817 20017 ...

> head(hour_2)
  Strain      second_hour    second_hour_n    Beula
1      1 19354.4444444444 12103.3628274451 21673.72
2      2 20021.2222222222 11577.7991047524 21307.61
3      3 16105.9444444444 14425.8808435683 19905.39
4      4 18993.3888888889 12149.3204615723 20816.78
5      5 15541.3888888889 15370.8433645383 20016.94
6      6 14767.1666666667 16288.3635541566 19000.44

I would like to plot in a scatterplot each explanatory value for each strain colored coded.

In my current attempt I first melt the dataframe using the following code:

> hour_2_melted <- melt(hour_2, id.vars = "Strain")
Warning message:
attributes are not identical across measure variables; they will be dropped

Then I plot

ggplot(hour_2_melted, aes(Strain, value)) + geom_point() 

plot

However the Y axis cannot be changed because its continuous, I do not want each value to be shown on the y axis. Also the x axis is in a strange order. Lastly, how do I color code the 3 different explanatory values?

Any help is appreciated.

2
  • Can you provide a reproducible example of your dataset ? see: stackoverflow.com/questions/5963269/… Commented Jan 8, 2020 at 0:43
  • 2
    The unexpected plotting is because you have imported three columns of your data as "factor" when you almost certainly want them as numeric. Either change upstream at import, or use hour_2$Strain = as.numeric(as.character(hour_2$Strain)) etc. to convert to numeric. Commented Jan 8, 2020 at 0:59

1 Answer 1

4

You can use tidyr package and the function pivot_longer to reshape your data for ggplot2:

library(tidyr)
library(dplyr)
df %>% pivot_longer(., - Strain, names_to = "Variable", values_to = "Value")

# A tibble: 18 x 3
   Strain Variable       Value
    <int> <chr>          <dbl>
 1      1 second_hour   19354.
 2      1 second_hour_n 12103.
 3      1 Beula         21674.
 4      2 second_hour   20021.
 5      2 second_hour_n 11578.
 6      2 Beula         21308.
 7      3 second_hour   16106.
 8      3 second_hour_n 14426.
 9      3 Beula         19905.
10      4 second_hour   18993.
11      4 second_hour_n 12149.
12      4 Beula         20817.
13      5 second_hour   15541.
14      5 second_hour_n 15371.
15      5 Beula         20017.
16      6 second_hour   14767.
17      6 second_hour_n 16288.
18      6 Beula         19000.

And then for plotting, you can pass it as a sequence of pipes

library(tidyr)
library(dplyr)
library(ggplot2)
df %>% pivot_longer(., - Strain, names_to = "Variable", values_to = "Value") %>%
  ggplot(aes(x = Strain, y = Value, color = Variable))+
  geom_point()

enter image description here

Regarding your issue with the order of the x axis, using the code of my answer and the reproducible example I provided (see below), I can't reproduce your issue (even if I transform Strain in factor levels before reshaping the dataframe):

library(tidyr)
library(dplyr)
library(ggplot2)
df$Strain <- as.factor(df$Strain)
df %>% pivot_longer(., - Strain, names_to = "Variable", values_to = "Value") %>%
  ggplot(aes(x = Strain, y = Value, color = Variable))+
  geom_point()

enter image description here

However, based on your dataframe, I would recommend to change your factor levels in numeric values by doing:

hour_2$Strain <- as.numeric(as.vector(hour_2$Strain))
hour_2$second_hour <- as.numeric(as.vector(hour_2$second_hour))
hour_2$second_hour_n <- as.numeric(as.vector(hour_2$second_hour_n))

Does it answer your question ?

Data

structure(list(Strain = 1:6, second_hour = c(19354.4444444444, 
20021.2222222222, 16105.9444444444, 18993.3888888889, 15541.3888888889, 
14767.1666666667), second_hour_n = c(12103.3628274451, 11577.7991047524, 
14425.8808435683, 12149.3204615723, 15370.8433645383, 16288.3635541566
), Beula = c(21673.72, 21307.61, 19905.39, 20816.78, 20016.94, 
19000.44)), class = "data.frame", row.names = c(NA, -6L))

Data 2

structure(list(Strain = c(1L, 2L, 21L, 44L, 5L, 6L), second_hour = c(19354.4444444444, 
20021.2222222222, 16105.9444444444, 18993.3888888889, 15541.3888888889, 
14767.1666666667), second_hour_n = c(12103.3628274451, 11577.7991047524, 
14425.8808435683, 12149.3204615723, 15370.8433645383, 16288.3635541566
), Beula = c(21673.72, 21307.61, 19905.39, 20816.78, 20016.94, 
19000.44)), class = "data.frame", row.names = c(NA, -6L))
Sign up to request clarification or add additional context in comments.

2 Comments

Oh wow, your answer helped a lot. Thank you for introducing me to this pivot longer function!!
Happy to help ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.