0

I'm getting an odd problem where I've attempted to plot a scatter graph and fit a quadratic regression line to it. I used stat_smooth() to make the line, and stat_regline_equation() to print the equation, but the equation that appears doesn't match the line.

Why could this be?

Here is the scatter plot with a quadratic regression line fitted and an equation that does not match the line: scatter plot with a quadratic regression line fitted and an equation that does not match the line

Here's a demo of the code I'm using:

#LIBRARIES
library(tidyverse)
library(ggpubr)
library(ggplot2)

test_db <- data.frame(
  X = c(0,-0.1297,-0.2185,-0.2795),
  Y = c(0.7569,0.7396,0.6561,0.5476)
  )


test_plot = ggscatter(test_db, x = "X", y = "Y",
                       xlim = c(-1,0),
                       ylim = c(0, 1),
                       fill = "red",
                       shape = 23
)+
  stat_regline_equation(label.x = -0.95, label.y = 0.15,  #print equation for regression line
                        formula = y ~ poly(x, 2),         #set regression line as quadratic
                        show.legend=FALSE,
  )+
  stat_smooth(method = "lm",               #plot smoothed conditional mean
              formula = y ~ poly(x, 2),    #set regression line as quadratic
              colour="black", size=0.5,
              fill="red"
              ) + 
  theme_bw() + 
  labs(x = "X", 
       y = "Y",
       title = "Test plot")
test_plot
1
  • 1
    It looks to me like the "quadratic polynomial fit" text equation displayed is completely wrong. A quick and dirty test in Excel shows a least squares fit equation of y = -4.2424x2 - 0.4473x + 0.7562 with R² = 0.9986 and that is good to 2-3 sig fig at predicting the y-values. I'd prefer the graph with x axis zoomed in to -0.3 to 0. Commented Nov 5 at 17:54

1 Answer 1

2

By default poly will create orthogonal polynomials, it's not the same as just doing x and x^2. If you want that, add raw=TRUE to the poly call. For example

ggscatter(test_db, x = "X", y = "Y",
          xlim = c(-1,0),
          ylim = c(0, 1),
          fill = "red",
          shape = 23
)+
  stat_regline_equation(label.x = -0.95, label.y = 0.15,  #print equation for regression line
                        formula = y ~ poly(x, 2, raw=TRUE),         #set regression line as quadratic
                        show.legend=FALSE,
  )+
  stat_smooth(method = "lm",               #plot smoothed conditional mean
              formula = y ~ poly(x, 2, raw=TRUE),    #set regression line as quadratic
              colour="black", size=0.5,
              fill="red"
  ) +
  theme_bw() + 
  labs(x = "X", 
       y = "Y",
       title = "Test plot") + 
  stat_function(fun=function(x) 0.68+0.15*x-0.072 * x^2, col="green") + 
  stat_function(fun=function(x) 0.76-0.45*x-4.2 * x^2, col="purple")

enter image description here Either use

formula = y ~ poly(x, raw=TRUE)
# or
formula = y ~ x + I(x^2)
Sign up to request clarification or add additional context in comments.

10 Comments

In what sense are you claiming that the green line is any kind of sane fit to these data? Fitting orthogonal polynomials by least squares improves the condition of the matrix but it doesn't materially alter the fit (unless the polynomial is quite large and the Vandermonde fitting matrix ill conditioned).
The question wasn't about model fitting. If it was, I would have redirected to Cross Validated. The question was about why doesn't the printed equation match the drawn line. And that was because of how poly was being used. That's what this answer is about. The orthogonal normals change the x values so they don't match what's being shown as the x axis on the plot.
So can you explain how to interpret this strange usage of orthogonal polynomials? It is clear that to get sane answers you need to override the default Poly(x,2) with raw=TRUE but why is a Polynomial fitting method that gets wrong non-physical answers the default? What purpose does it serve?
Because if you just use squared terms, they values can be correlated. Typically you want independent predictors for best fit. So the poly function tries to make them less correlated but still meaningful. You can check out the reference in the poly help page for more info, or look into "Gram–Schmidt orthogonalization". It just has better numerical properties at the expense of interpretability.
I think the coefficients using the orthogonal polynomials are sometimes more interpretable than the power basis coefficients. If you want to answer the question "Do I need a quadratic term?", just look at the quadratic term in the orthogonal parametrization. Using the power series you need to fit both models and do an F test.
You (and the referenced dup) seem to be making excuses for the buggy behaviour of stat_regline_equation. The formula it was given was y ~ poly(x, 2), but it is displaying the equation using the coefficients as though the formula was y ~ poly(x, 2, raw = TRUE). That's a bug. Why not report it to the author?
@user2554330 I suggest that you vote to re-open this question. The closed as a duplicate is bogus. The OP has clearly demonstrated a serious fault in R stat_regline_equation and y ~ poly(x,2) which is returning an equation it claims is in powers of user coordinate x but is actually in the rescaled [-1,1] coordinate system of the orthogonal polynomial basis functions used internally for the fit. There is no good reason why it cannot return a polynomial scaled correctly in user coordinates.
I think this is a dup of the previous question, which also demonstrated the bug. But you seem to imply this is a bug in R. It's not, it's a bug in the contributed package ggpubr. There's nothing wrong with poly(x, 2). It is behaving as documented.
But what if the formula contained other functions than just poly()? It would be impossible for the labeling function to know how to transform the output of an arbitrary function into user coordinates. While the OP hasn't responded, it seems the mistake was assuming poly(x, 2) is the same as x + x^2 which it is not. There's nothing to suggest they ever wanted orthogonalization.
Thank you, this has solved my problem - I am now getting the results expected. I was not wanting orthogonalisation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.