0

Hy guys, I'm working with ggplot2 and creating a geographic representation of my country. This is the dataset and the script I'm using ( prov2022 is the shapefile for the map)

#database
COD_REG   COD_PROV   Wage
1         91         530
1         92         520
1         93         410
2         97         300
2         98         205
2         99         501
13        102        700
13        103        800   
13        159        900
18        162        740
18        123        590   
18        119        420
19        162        340
19        123        290   
19        119        120
#script
right_join(prov2022, database, by = "COD_PROV") %>% 
  ggplot(aes(fill = `Wage`))+
  geom_sf(data = ~ subset(., `Wage` > 300 & `Wage` <= 800)) +
  theme_void() +
  theme(legend.title=element_blank())+
  scale_fill_gradientn(colors = c( 'white', 'yellow' , 'red', 'black')) +
  geom_blank()

It works fine, but I'm insterested in visualizing also the shapes of the areas that I've exclude with the command subset. My purpose was to fill with the color gradient only the regions with Wage > 300 & Wage <= 800, but setting geom_sf(data = ~ subset(., Wage > 300 & Wage <= 800)) I have removed completely the ones that do not satisfy this condition from my map. Actually, I need to have them in the output but whitout being filled (just their shapes).

How do I solve?

UPDATE ABOUT SCRIPT This is what I'm using after @r2evans' suggestion

right_join(prov2022, database, by = "COD_PROV") %>% 
  ggplot(aes(fill = `Importo medio mensile`))+
  geom_sf(data = ~ transform(., `Importo medio mensile` = ifelse(`Importo medio mensile` > 1500 & `Importo medio mensile` <= 1700, `Importo medio mensile`[NA], `Importo medio mensile`))) + 
  theme_void() +
  theme(legend.title=element_blank())+
  scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'), na.value = "#00000000") +
  geom_blank()

but the answer is

Error in FUN(X[[i]], ...) : object 'Importo medio mensile' not found

UPDATE PART 2

If I want to fill using another variable Salario reale, but I want to maintaining the selection of the areas done with the values of the previous variable Importo medio mensile, what should I do?

Substituting only the fill variable doesn't work

right_join(prov2022, database, by = "COD_PROV") %>% 
  ggplot(aes(fill = `Salario Reale`))+
  geom_sf(data = ~ dplyr::mutate(., `Importo medio mensile` = ifelse(`Importo medio mensile` > 1500 & `Importo medio mensile` <= 1700, `Importo medio mensile`, `Importo medio mensile`[NA]))) + 
  theme_void() +
  theme(legend.title=element_blank())+
  scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'), na.value = "#00000000") +
  geom_blank()

it colors all the regions of my country as if the subset that I want to maintain (the one with the variable Importo medio mensile) weren't there. How can I solve?

UPDATE 3 The solution proposed by r2evans works!!

2
  • 1
    Hi @io_boh, could you provide a fully reproducible example (maybe using the nc shapefile that ships with sf)? Commented Feb 3, 2023 at 18:40
  • Perhaps you could add a step before ggplot like mutate(Wage = if_else(between(Wage, 300, 800), Wage, NA_real_)) %>% so that Wages outside that range are NA and given a different fill. (by default na.value is "grey50") Commented Feb 3, 2023 at 18:43

1 Answer 1

1

Instead of filtering out the data, just replace (optionally inline) the not-to-be-colored values with NA.

Continuing from my previous answer,

ggplot(usa, aes(fill = val)) +
  geom_sf(data = ~ transform(., val = ifelse(val < 0.5, val[NA], val))) + 
  scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black')) + 
  geom_blank()

ggplot2 with other regions values replaced with NA

(The use of val[NA] is to make sure we have the one specific class of NA, as there are at least 6 different types of NA.)

Granted, gray may not be what you want, so you can fix that with na.value= (its default is na.value="grey50").

ggplot(usa, aes(fill = val)) +
  geom_sf(data = ~ transform(., val = ifelse(val < 0.5, val[NA], val))) + 
  scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'), 
                       na.value = "#00000000") + 
  geom_blank()

same image but with non-colored states now with transparent color

where #00000000 is a fully-transparent color. The first six 0s don't matter, the trailing two 00 indicates an alpha of 0 (transparent).


I think this means you want something like this:

right_join(prov2022, database, by = "COD_PROV") %>% 
  ggplot(aes(fill = `Importo medio mensile`))+
  geom_sf(data = ~ mutate(., `Importo medio mensile` = ifelse(`Importo medio mensile` > 300 & `Importo medio mensile` <= 800, `Importo medio mensile`, `Importo medio mensile`[NA]))) +
  theme_void() +
  theme(legend.title=element_blank())+
  scale_fill_gradientn(colors = c( 'white', 'yellow' , 'red', 'black')) +
  geom_blank()

Notes:

  • I updated from `Wage` in your example to `Importo medio mensile` as you mentioned in your comments;
  • My code above uses transform, which is base R and in general works fine, except when the names being used are "not normal R names", in which case it tends to add .s to the name. The use of dplyr::mutate fixes this problem. You're already using right_join, so I think I'm not adding any dependency.

Another way to look at this: the data = ~ mutate(...) is changing the data internally-only, so that the original data is untouched. One could easily do something like this for the same effect.

right_join(prov2022, database, by = "COD_PROV") %>% 
  mutate(SOMETHING = ifelse(`Importo medio mensile` > 300 & `Importo medio mensile` <= 800, `Importo medio mensile`, `Importo medio mensile`[NA])) %>%
  ggplot(aes(fill = SOMETHING)) +
  geom_sf() +
  theme_void() +
  theme(legend.title=element_blank())+
  scale_fill_gradientn(colors = c( 'white', 'yellow' , 'red', 'black')) +
  geom_blank(aes(fill = `Importo medio mensile`))

noting that we needed to redefine fill= in the blank geom so that the correct range of values would be processed by ggplot.

Sign up to request clarification or add additional context in comments.

15 Comments

thank you another time! substituting my variable, I get this answer from R:<< Error in FUN(X[[i]], ...) : object 'Wage' not found ...
Not sure. Is there a reason you're using backticks around it? It's a simple-enough symbol, typically backticks are only required if the name is not a normal R name (e.g., starts with a number or includes spaces or special characters).
Probably this is the reason! Actually, the name of my variable is not Wage, but It is Importo medio mensile ( which means "monthly average wage" in my native language ). Therefore I have included it in my script using backticks around it until now and it worked! This is the way how I'm using it: Importo medio mensile.... How can I solve??
Okay, then yes backticks are required (due to the spaces).
yes, i'm sorry, it is my first time that I use a forum and I didn't have any idea how it works.... I didn't realize that there was a guide to read before. Thanks for having explained me how to use the forum properly ( hoping that I've accepted the answers in the right way), because your explanantions will be helpful for other people. And thank for you R knowledge, without you it would have been impossible to me to solve it
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.