stat_regline_equation() doesn't match stat_smooth() line

Question

I'm getting an odd problem where I've attempted to plot a scatter graph and fit a quadratic regression line to it. I used stat_smooth() to make the line, and stat_regline_equation() to print the equation, but the equation that appears doesn't match the line.

Why could this be?

Here is the scatter plot with a quadratic regression line fitted and an equation that does not match the line:

Here's a demo of the code I'm using:

#LIBRARIES
library(tidyverse)
library(ggpubr)
library(ggplot2)

test_db <- data.frame(
  X = c(0,-0.1297,-0.2185,-0.2795),
  Y = c(0.7569,0.7396,0.6561,0.5476)
  )


test_plot = ggscatter(test_db, x = "X", y = "Y",
                       xlim = c(-1,0),
                       ylim = c(0, 1),
                       fill = "red",
                       shape = 23
)+
  stat_regline_equation(label.x = -0.95, label.y = 0.15,  #print equation for regression line
                        formula = y ~ poly(x, 2),         #set regression line as quadratic
                        show.legend=FALSE,
  )+
  stat_smooth(method = "lm",               #plot smoothed conditional mean
              formula = y ~ poly(x, 2),    #set regression line as quadratic
              colour="black", size=0.5,
              fill="red"
              ) + 
  theme_bw() + 
  labs(x = "X", 
       y = "Y",
       title = "Test plot")
test_plot

It looks to me like the "quadratic polynomial fit" text equation displayed is completely wrong. A quick and dirty test in Excel shows a least squares fit equation of y = -4.2424x2 - 0.4473x + 0.7562 with R² = 0.9986 and that is good to 2-3 sig fig at predicting the y-values. I'd prefer the graph with x axis zoomed in to -0.3 to 0. — Martin Brown
– Martin Brown, Commented Nov 5 at 17:54

MrFlick · Accepted Answer · 2025-11-05 18:01:07Z

2

By default poly will create orthogonal polynomials, it's not the same as just doing x and x^2. If you want that, add raw=TRUE to the poly call. For example

ggscatter(test_db, x = "X", y = "Y",
          xlim = c(-1,0),
          ylim = c(0, 1),
          fill = "red",
          shape = 23
)+
  stat_regline_equation(label.x = -0.95, label.y = 0.15,  #print equation for regression line
                        formula = y ~ poly(x, 2, raw=TRUE),         #set regression line as quadratic
                        show.legend=FALSE,
  )+
  stat_smooth(method = "lm",               #plot smoothed conditional mean
              formula = y ~ poly(x, 2, raw=TRUE),    #set regression line as quadratic
              colour="black", size=0.5,
              fill="red"
  ) +
  theme_bw() + 
  labs(x = "X", 
       y = "Y",
       title = "Test plot") + 
  stat_function(fun=function(x) 0.68+0.15*x-0.072 * x^2, col="green") + 
  stat_function(fun=function(x) 0.76-0.45*x-4.2 * x^2, col="purple")

Either use

formula = y ~ poly(x, raw=TRUE)
# or
formula = y ~ x + I(x^2)

answered Nov 5 at 18:01

MrFlick

209k19 gold badges300 silver badges324 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Martin Brown Nov 5 at 19:09

In what sense are you claiming that the green line is any kind of sane fit to these data? Fitting orthogonal polynomials by least squares improves the condition of the matrix but it doesn't materially alter the fit (unless the polynomial is quite large and the Vandermonde fitting matrix ill conditioned).

MrFlick Nov 5 at 21:58

The question wasn't about model fitting. If it was, I would have redirected to Cross Validated. The question was about why doesn't the printed equation match the drawn line. And that was because of how poly was being used. That's what this answer is about. The orthogonal normals change the x values so they don't match what's being shown as the x axis on the plot.

Martin Brown Nov 5 at 22:14

So can you explain how to interpret this strange usage of orthogonal polynomials? It is clear that to get sane answers you need to override the default Poly(x,2) with raw=TRUE but why is a Polynomial fitting method that gets wrong non-physical answers the default? What purpose does it serve?

MrFlick Nov 6 at 0:33

Because if you just use squared terms, they values can be correlated. Typically you want independent predictors for best fit. So the poly function tries to make them less correlated but still meaningful. You can check out the reference in the poly help page for more info, or look into "Gram–Schmidt orthogonalization". It just has better numerical properties at the expense of interpretability.

user2554330 Nov 7 at 10:00

I think the coefficients using the orthogonal polynomials are sometimes more interpretable than the power basis coefficients. If you want to answer the question "Do I need a quadratic term?", just look at the quadratic term in the orthogonal parametrization. Using the power series you need to fit both models and do an F test.

user2554330 Nov 6 at 10:04

You (and the referenced dup) seem to be making excuses for the buggy behaviour of stat_regline_equation. The formula it was given was y ~ poly(x, 2), but it is displaying the equation using the coefficients as though the formula was y ~ poly(x, 2, raw = TRUE). That's a bug. Why not report it to the author?

Martin Brown Nov 7 at 9:21

@user2554330 I suggest that you vote to re-open this question. The closed as a duplicate is bogus. The OP has clearly demonstrated a serious fault in R stat_regline_equation and y ~ poly(x,2) which is returning an equation it claims is in powers of user coordinate x but is actually in the rescaled [-1,1] coordinate system of the orthogonal polynomial basis functions used internally for the fit. There is no good reason why it cannot return a polynomial scaled correctly in user coordinates.

user2554330 Nov 7 at 9:55

I think this is a dup of the previous question, which also demonstrated the bug. But you seem to imply this is a bug in R. It's not, it's a bug in the contributed package ggpubr. There's nothing wrong with poly(x, 2). It is behaving as documented.

MrFlick Nov 7 at 15:05

But what if the formula contained other functions than just poly()? It would be impossible for the labeling function to know how to transform the output of an arbitrary function into user coordinates. While the OP hasn't responded, it seems the mistake was assuming poly(x, 2) is the same as x + x^2 which it is not. There's nothing to suggest they ever wanted orthogonalization.

CJG Nov 17 at 11:24

Thank you, this has solved my problem - I am now getting the results expected. I was not wanting orthogonalisation.

Collectives™ on Stack Overflow

stat_regline_equation() doesn't match stat_smooth() line

1 Answer 1

10 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related