1

I would like to understand how linear.predictors are computed in the output of

pb = glm(formula, family = binomial( link = "probit" ), data)

From my understanding, it should be the product matrix of the observations (N x k) and the estimated coefficients (k x 1), with N = sample size, k = number of variables.

I tried to compute them by hand in two fashions:

rowSums(mapply(`*`,pb$model,pb$coefficients))

and

as.matrix(pb$model)%*%as.matrix(pb$coefficients)

Both gave me the same vector of values that equals pb$linear.predictors for some observations but not all.

Could you help me understand how it is computed and how I could reproduce it by hand?

2 Answers 2

4

If you look at the output of pb$model you'll see this also includes the outcome, y, and you don't want to multiply by the outcome to get predicted values.

Try:

as.matrix(pb$model[, -1])%*%as.matrix(pb$coefficients[-1]) + pb$coefficients[1]

Remove the outcome column from pb$model, and remove the intercept from the coefficients. Then multiply the matrices and add the intercept back in.

You could alternatively replace the first column of pb$model with a column of 1s and multiply by pb$coefficients. This is the approach that follows matrix notation if you look up linear algebra for regression.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. I'll go with @Ben Bolker's solution as it is more concise but you provide a great answer on WHY I was wrong.
3

In general it's better to use accessor methods, if they're available, then internal components of the model (accessed via $, or @ for S4 objects); internal components can be poorly documented/surprising and/or subject to change (although the latter is unlikely for a foundational component of R like a glm object).

model.matrix(pb) %*% coef(pb)

should work to compute the linear predictor for a wide range of model types.

To follow up on @nrennie's answer: the first column of the model frame was probably the response variable, while the first element of the coefficient vector is the intercept. If the response was 1.0, then you would correctly add the intercept term to the model. (The model matrix, as opposed to the model frame, has a first column composed entirely of ones ...)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.