here is a test code and I don't understand why is not working as expected. Is a ggplot2 question, not an R one.
library(ggplot2)
K = 10
x <- 1:100/100
y <- sapply (x, FUN= function(x) 1+x)
xy <- data.frame(x,y)
set.seed(1234)
xy$yrand <- xy$y + runif(100,min = -0.35, max = 0.5)
folds <- cut(seq(1, nrow(xy)), breaks = K, labels = FALSE)
p1 <- ggplot(xy, aes(x = xy$x, y = xy$yrand))+geom_point() +ggtitle ("Simple
x vs y plot with added random noise") + xlab("X") + ylab("Y")
for(i in 1:K){
#Segement your data by fold using the which() function
testIndexes <- which(folds==i,arr.ind=TRUE)
testData <- xy[testIndexes, ]
trainData <- xy[-testIndexes, ]
lmTemp <- lm(yrand ~ x, data = trainData)
p1 <- p1 + geom_line(data = trainData, aes(x = trainData$x, y = predict(lmTemp, newdata = trainData)))
}
p1
Now what I would like to see is a plot with 10 lines (the regression lines). But I only see one. Can someone help me out? Is the ggplot2 syntax that is wrong?
Thanks, Umberto
EDITED:
I marked the answer I got since it is a nice way of doing it. I just wanted to add a simple way of doing it preparing the datasets for the graph I wanted to create. I think this method is slightly easier to understand if you don't have so much R experience.
for(i in 1:K){
#Segement your data by fold using the which() function
testIndexes <- which(folds==i,arr.ind=TRUE)
testData <- xy[testIndexes, ]
trainData <- xy[-testIndexes, ]
lmTemp <- lm(yrand ~ x, data = trainData)
# Let's build a data set for the lines
fitLines <- rbind(fitLines, data.frame(rep(paste("set",i),nrow(trainData)),trainData[,1], predict(lmTemp, newdata = trainData)))
}
names(fitLines) <- c("set", "x","y")
p1 + geom_line(data = fitLines, aes(x = x, y = y, col = set))
And this is what you get




foldsis not defined, sotestIndexesis probably empty, conclusion : in the loop you always use the same data set.stat_smooth, so you can remove lelmline.trainDatawithtestData