1

I'm trying to make a line chart of the density of different species along a vegetation transect.

There are two transects (Transect A and Transect B), which I would like to plot separately.

"Cover" describes the %cover of each species within a zone. Start..m. and End..m. are the start and end points for each zone (some zones are longer than others).

> str(veg2)
'data.frame':   237 obs. of 8 variables:
$ Transect            : chr  "Transect A" "Transect A" "Transect A" "Transect A" ...
$ Zone                : chr  "A1" "A1" "A1" "A1" ...
 $ Scientific.name     : chr  "Bromus diandrus" "Panicum hillmanii" "Lolium rigidum" "Rumex brownii" ...
 $ Common.name..lookup.: chr  "Great Brome" "Witch Panic" "Wimmera Rye-grass" "Slender Dock" ...
 $ Density             : chr  "D" "Sp" "VS" "Sc" ...
 $ Cover               : int  85 20 5 1 1 1 20 1 5 5 ...
$ Start..m.           : num  0 0 0 0 0 0 1.69 1.69 1.69 1.69 ...
 $ End..m.             : num  1.69 1.69 1.69 1.69 1.69 1.69 5.13 5.13 5.13 5.13 ...

I also have information about each species. I'm mostly interested in "Wetland Dependence" (a series of one or two letter codes) and whether or not the species is indigenous or introduced.

> str(spp)
'data.frame':   270 obs. of 4 variables:
 $ Scientific.Name   : chr  "Aira caryophyllea subsp. caryophyllea" "Aira cupaniana" "Alternanthera denticulata s.s." "Althenia australis" ...
 $ Common.Name       : chr  "Silvery Hair-grass" "Quicksilver Grass" "Lesser Joyweed" "Austral Water-mat" ...
$ Wetland.Dependence: chr  "T" "T" "MF" "OA" ...
$ Origin_Desc       : chr  "Introduced" "Introduced" "Indigenous" "Indigenous" ...

First, I join the two datasets:

veg_join <- veg2 %>%
  left_join(
    spp %>% select(Scientific.Name, Origin_Desc, Wetland.Dependence),
    by = c("Scientific.name" = "Scientific.Name")
  ) %>%
  mutate(
    Origin_Desc = ifelse(is.na(Origin_Desc), "Other", Origin_Desc),
    Origin_Desc = factor(Origin_Desc,
                         levels = c("Indigenous", "Introduced", "Other"))
  ) %>%
  arrange(Origin_Desc, Wetland.Dependence, Scientific.name) %>%
  mutate(
    Scientific.name = factor(Scientific.name,
                             levels = unique(Scientific.name))
  )

Then I look at Transect A:

# Filter for transect A
remove_spp <- c("unknown thistle", "unidentified herb")

transectA <- veg_join %>%
  filter(Transect == "Transect A") %>%
  filter(!Scientific.name %in% remove_spp)

Then I plot the data. The thickness of the line segments is Cover, and they are colour-coded by "wetland dependence". The plot is split into facets for "Indigenous", "Introduced" and "NA" (for bare ground and water).

It's all looking good EXCEPT the order of the labels on the y-axis (Scientific Name) is messy. Currently, species seem to be ordered by Wetland Dependence within each Origin_Desc facet. However, I'd like it to be alphabetical in each facet (ordered from top to bottom), regardless of Wetland Dependence.

I'm so close but I need a little help tidying up this last little bit!

ggplot(transectA) +
  geom_segment(
    aes(x = Start..m., xend = End..m.,
        y = Scientific.name, yend = Scientific.name,
        linewidth = Cover, colour = Wetland.Dependence),
    lineend = "round"
  ) +
  scale_color_manual(values = fg_cols) +
  scale_linewidth(range = c(1, 4)) +
  
  ggh4x::facet_grid2(
    Origin_Desc ~ .,
    scales = "free_y",
    space = "free_y"     
  )  +
  theme_minimal(base_size = 14)+
  labs(
    title = "Transect A",
    x = "Distance (m)",
    y = "Scientific Name",
    linewidth = "Cover (%)"
  )
5
  • 6
    ggplot will respect the sorting of the factor you are using. In this case you have done unique, which does not sort. If you do mutate( Scientific.name = factor(Scientific.name, levels = sort(unique(Scientific.name))) ) it should work. Try to give reproducible examples; we do not have your data from your example, and you also failed to incluce required library calls in your code Commented Dec 1 at 13:43
  • to arrange a discrete variable on the y-axis alphabetically from to top bottom, in stead of bottom to top (the default) you could use: transectA$Scientific.name <- forcats::fct_rev(transectA$Scientific.name) Commented Dec 1 at 14:57
  • 1
    This question is similar to: Sorting a ggplot axis' factor according to another factor's levels. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. Commented Dec 1 at 17:06
  • 1
    The first half of this question seems a distraction, we don't need to know how you formed the data when your key problem is "the order of the labels on the y-axis (Scientific Name) is messy". The problem is that you don't sort it, as KieranMartin said. Since you want a reversed order, you must use factors. Just about every question on SO that references ggplot2 and axis order is resolved quickly with factor(.., levels=..) or something from forcats::fct_*(). Commented Dec 1 at 17:08
  • Thanks for the feedback. The suggestion from KieranMartin works. Commented Dec 1 at 23:59

1 Answer 1

1

As mentioned in the comments, ggplot2 ordering of discrete variables will always respect the levels of your factor. If no ordering is provided ggplot2 will try to "guess" the correct ordering; for a character vector, for instance, it defaults to alphabetical ordering. In your case, you define the levels to be unique(Scientific.name) and thus ggplot2 will follow this specific order.

This means whatever order Scientific.name appears in your original dataset becomes the final ordering on the plot. To understand why this might not be the same as the alphabetical order consider this example using the mpg dataset.

library(ggplot2)
library(dplyr)
library(ggh4x)

# The 'class' column is not ordered alphabetically
unique(mpg$class)
#> [1] "compact"    "midsize"    "suv"        "2seater"    "minivan"   
#> [6] "pickup"     "subcompact"
# The 'class' column is a character vector, NOT a factor
typeof(mpg$class)
#> [1] "character"
# Since 'class' is character, ordering is alphabetical
ggplot(mpg, aes(displ, class)) +
  geom_point() +
  facet_grid2(cyl ~ drv, scales = "free")

enter image description here

# If I define the order of the levels as 'unique(class)'
# it will follow 'compact', 'midsize', 'suv', ..., 'subcompact'
mpg2 <- mpg |>
  mutate(class = factor(class, levels = unique(class)))
# Notice how 'suv' appears in between 'midsize' and 'minivan'
ggplot(mpg2, aes(displ, class)) +
  geom_point() +
  facet_grid2(cyl ~ drv, scales = "free")

enter image description here

Created on 2025-12-04 with reprex v2.1.1

In short, using unique(colname) to define the levels of your factor can result in unexpected results, since it will follow the ordering of your data. It's better to clearly specify the ordering, when creating the factor, or to use {forcats} to re-order your levels appropriately.

The solution to your problem is to either leave Scientific.name as character vector or to simply use Scientific.name = as.factor(Scientific.name) since this will default to alphabetical order.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.