I have the following DATA
And the following code that builds a binomial logistic regression model, wherein all variables are factors:
#setwd("wherever you downloaded the file")
data_ev <- read.csv("all_EV.csv")
df_all_EV <- data.frame(data_ev)
#remove extra columns
df_all_EV <- df_all_EV[,-1]
df_all_EV <- df_all_EV[,-3]
df_all_EV <- df_all_EV[,-4]
#remove uneeded rows
df_ev2 <- subset(df_all_EV, EV!="unknown")
#factorize
df_ev2$EV <- as.factor(df_ev2$EV)
df_ev2$Speech_VP <- as.factor(df_ev2$Speech_VP)
df_ev2$Genre <- as.factor(df_ev2$Genre)
#set response variable ref level
df_ev2$EV <- relevel(df_ev2$EV, ref = "self")
#create glm object
ev2.glm <- glm(EV ~ Genre + Speech_VP, data = df_ev2, family = binomial)
summary(ev2.glm)
#plot glm
library(visreg)
visreg(ev2.glm, "Speech_VP")
visreg(ev2.glm, "Genre")
visreg(ev2.glm, "Speech_VP", by = "Genre")
This produces a logistic regression with the following output:
Call:
glm(formula = EV ~ Genre + Speech_VP, family = binomial, data = df_ev2)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.0115 -0.4628 -0.1381 0.5326 3.0519
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.6475 0.5115 -9.086 < 2e-16 ***
GenreTN 4.0611 0.4379 9.274 < 2e-16 ***
Speech_VPN 2.4675 0.3762 6.559 5.4e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 434.02 on 313 degrees of freedom
Residual deviance: 227.37 on 311 degrees of freedom
AIC: 233.37
Number of Fisher Scoring iterations: 5
I am interested in visualizing this regression model, so I use the visreg package.
For example, this plot shows the outcomes of the Genre variable:

However, there's a problem here. I believe the Y axis is showing the log odds, but the log odds given in the summary of the glm object don't seem to match up with the plot. For genre, the log odds of the response variable occurring with TN is 4.06, but in this plot, the blue line (fitted values) seems to be around 2.
It's the same for the other variable, the plot does capture the overall relationship well, but I can't figure out what the y-axis is supposed to line up with.
So what accounts for this apparent disconnect between what the summary of the model shows, and what the plot of the model shows?