How are computed linear predictors in R glm probit?

Question

I would like to understand how linear.predictors are computed in the output of

pb = glm(formula, family = binomial( link = "probit" ), data)

From my understanding, it should be the product matrix of the observations (N x k) and the estimated coefficients (k x 1), with N = sample size, k = number of variables.

I tried to compute them by hand in two fashions:

rowSums(mapply(`*`,pb$model,pb$coefficients))

and

as.matrix(pb$model)%*%as.matrix(pb$coefficients)

Both gave me the same vector of values that equals pb$linear.predictors for some observations but not all.

Could you help me understand how it is computed and how I could reproduce it by hand?

nrennie · Accepted Answer · 2024-02-22 22:35:09Z

4

If you look at the output of pb$model you'll see this also includes the outcome, y, and you don't want to multiply by the outcome to get predicted values.

Try:

as.matrix(pb$model[, -1])%*%as.matrix(pb$coefficients[-1]) + pb$coefficients[1]

Remove the outcome column from pb$model, and remove the intercept from the coefficients. Then multiply the matrices and add the intercept back in.

You could alternatively replace the first column of pb$model with a column of 1s and multiply by pb$coefficients. This is the approach that follows matrix notation if you look up linear algebra for regression.

answered Feb 22, 2024 at 22:35

nrennie

2,5771 gold badge7 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

hexolitemax Over a year ago

Thank you. I'll go with @Ben Bolker's solution as it is more concise but you provide a great answer on WHY I was wrong.

Ben Bolker · Accepted Answer · 2024-02-23 15:23:44Z

3

In general it's better to use accessor methods, if they're available, then internal components of the model (accessed via $, or @ for S4 objects); internal components can be poorly documented/surprising and/or subject to change (although the latter is unlikely for a foundational component of R like a glm object).

model.matrix(pb) %*% coef(pb)

should work to compute the linear predictor for a wide range of model types.

To follow up on @nrennie's answer: the first column of the model frame was probably the response variable, while the first element of the coefficient vector is the intercept. If the response was 1.0, then you would correctly add the intercept term to the model. (The model matrix, as opposed to the model frame, has a first column composed entirely of ones ...)

edited Feb 23, 2024 at 15:23

answered Feb 23, 2024 at 0:39

Ben Bolker

231k26 gold badges405 silver badges497 bronze badges

Collectives™ on Stack Overflow

How are computed linear predictors in R glm probit?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related