8

I am trying to create a scatterplot matrix from my dataset so that in the resulting matrix:

  • I have two different groups based on
    • Quarter of the year (distinguished as the colours of points)
    • Day type (shape of points indicating, is it weekend or casual day between Monday and Friday)
  • Logarithmic-scaled x and y axes.
  • Values on axis tick labels are not logarithmic i.e. values should be shown on axes as integers between 0 to 350, not their log10 counterparts.
  • Upper panel has correlation values for each quarter.

So far I've tried using functions:

  1. pairs()
  2. ggpairs() [from GGally package]
  3. scatterplotMatrix()
  4. splom()

But I haven't been able to get decent results with these packages, and every time it seems that one or more of my requirements are missing.

  • With pairs(), I'm able to create the scatterplot matrix, but the parameter log="xy" somehow removes the variable names from the diagonal of the resulting matrix.
  • ggpairs() doesn't support logarithmic scales directly, but I created a function that goes through the scatterplot matrix's diagonal and lower plane based on this answer. Though the logarithmic scaling works on lower plane, it messes up the variable labels and value ticks.

Function is created and used as follows:

ggpairs_logarithmize <- function(a) { # parameter a is a ggpairs sp-matrix
        max_limit <- sqrt(length(a$plots))
        for(row in 1:max_limit) { # index 1 is used to go through the diagonal also
                for(col in j:max_limit) {
                        subsp <- getPlot(a,row,col)
                        subspnew <- subsp + scale_y_log10() + scale_x_log10()
                        subspnew$type <- 'logcontinous'
                        subspnew$subType <- 'logpoints'
                        a <- putPlot(a,subspnew,row,col)
                }
        }
        return(a)
}
scatplot <- ggpairs(...)
scatplot_log10 <- ggpairs_logarithmize(scatplot)
scatplot_log10
  • scatterplotMatrix() didn't seem to support two groupings. I was able to do this separately for season and day type though, but I need both groups in the same plot.
  • splom() somehow labels the axis tick values also to logarithmic values, and these should be kept as they are (between integers 0 and 350).

Are there any simple solutions available to create a scatterplot matrix with logarithmic axes with the requirements I have?

EDIT (13.7.2012): Example data and output were asked. Here's some code snippets to produce a demo dataset:

Declare necessary functions

logarithmize <- function(a)
{
        max_limit <- sqrt(length(a$plots))
        for(j in 1:max_limit) {
                for(i in j:max_limit) {
                        subsp <- getPlot(a,i,j)
                        subspnew <- subsp + scale_y_log10() + scale_x_log10()
                        subspnew$type <- 'logcontinous'
                        subspnew$subType <- 'logpoints'
                        a <- putPlot(a,subspnew,i,j)
                }
        }
        return(a)
}

add_quarters <- function(a,datecol,targetcol) {
    for(i in 1:nrow(a)) {
        month <- 1+as.POSIXlt(as.Date(a[i,datecol]))$mon
        if ( month <= 3 ) { a[i,targetcol] <- "Q1" }
        else if (month <= 6 && month > 3) { a[i,targetcol] <- "Q2" }
        else if ( month <= 9 && month > 6 ) { a[i,targetcol] <- "Q3" }
        else if ( month > 9 ) { a[i,targetcol] <- "Q4" }
    }
    return(a)
}

Create dataset:

days <- seq.Date(as.Date("2010-01-01"),as.Date("2012-06-06"),"day")
bananas <- sample(1:350,length(days), replace=T)
apples <- sample(1:350,length(days), replace=T)
oranges <- sample(1:350,length(days), replace=T)
weekdays <- c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")
fruitsales <- data.frame(Date=days,Dayofweek=rep(weekdays,length.out=length(days)),Bananas=bananas,Apples=apples,Oranges=oranges)
fruitsales[5:6,"Quarter"] <- NA
fruitsales[6:7,"Daytype"] <- NA
fruitsales$Daytype <- fruitsales$Dayofweek
levels(fruitsales$Daytype) # Confirm the day type levels before assigning new levels
levels(fruitsales$Daytype) <- c("Casual","Casual","Weekend","Weekend","Casual","Casual","Casual
")
fruitsales <- add_quarters(fruitsales,1,6)

Excecute (NOTE! Windows/Mac users, change x11() according to what OS you have)

# install.packages("GGally")
require(GGally)
x11(); ggpairs(fruitsales,columns=3:5,colour="Quarter",shape="Daytype")
x11(); logarithmize(ggpairs(fruitsales,columns=3:5,colour="Quarter",shape="Daytype"))
4
  • Why can't you just do pairs(log(dat)) as outlined here? Commented Jul 5, 2012 at 15:00
  • pairs() does work for creating the scatterplot matrix, but unfortunately by calling e.g. pairs(log(data),col=palette[unclass(data$Quarter)],pch=c(4,19)[mod$Daytype]) the axis annotations are also now logarithmic. And as I stated, they are needed in the original form as integers between 0 and 350, not in their logarithmic values from 0 to ~6. Commented Jul 6, 2012 at 9:37
  • Could you show sample data and desired output? Handwritten one is sufficient. Commented Jul 13, 2012 at 2:46
  • Added an example dataset creation and plottings, cheers! Commented Jul 13, 2012 at 7:52

1 Answer 1

4
+50

The problem with pairs stems from the use of user co-ordinates in a log coordinate system. Specifically, when adding the labels on the diagonals, pairs sets

par(usr = c(0, 1, 0, 1))

however, if you specify a log coordinate system via log = "xy", what you need here is

par(usr = c(0, 1, 0, 1), xlog = FALSE, ylog = FALSE) 

see this post on R help.

This suggests the following solution (using data given in question):

## adapted from panel.cor in ?pairs
panel.cor <- function(x, y, digits=2, cex.cor, quarter, ...)
{
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1), xlog = FALSE, ylog = FALSE)
  r <- rev(tapply(seq_along(quarter), quarter, function(id) cor(x[id], y[id])))
  txt <- format(c(0.123456789, r), digits=digits)[-1]
  txt <- paste(names(txt), txt)
  if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
  text(0.5, c(0.2, 0.4, 0.6, 0.8), txt)
}

pairs(fruitsales[,3:5], log = "xy", 
      diag.panel = function(x, ...) par(xlog = FALSE, ylog = FALSE),
      label.pos = 0.5,
      col = unclass(factor(fruitsales[,6])), 
      pch = unclass(fruitsales[,7]), upper.panel = panel.cor, 
      quarter = factor(fruitsales[,6]))

This produces the following plot

pairs plot on log coordinate system

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your detailly explained solution! This works perfectly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.