14

I have 2 vectors as below

Vec1 = c(1,2,3,4)
names(Vec1) = c('Val1', 'Val2', 'Val3', 'Val4')

Vec2 = c(10, 11)
names(Vec2) = c('Val2', 'Val4')

Is there any easy way to subtract Vec2 from Vec1 based on same names in the vector element? The names which are not available in Vec2 can be assumed to have zero value.

0

7 Answers 7

12
`[<-`(Vec1, names(Vec2), Vec1[names(Vec2)] - Vec2)

## OR using replace() without creating a helper variable

Vec1 |> replace(names(Vec2), Vec1[names(Vec2)] - Vec2)

Result:

#> Val1 Val2 Val3 Val4 
#>    1   -8    3   -7
Sign up to request clarification or add additional context in comments.

Comments

10

If Vec1 always has the full length (and Vec2 may or may not have missing entries) an approach using replace

Vecs <- Vec2[names(Vec1)]

Vec1 - replace(Vecs, is.na(Vecs), 0)
Val1 Val2 Val3 Val4 
   1   -8    3   -7

Comments

10

R makes things easy when your objects are in a data frame. If we create a column containing -Vec2 we can take the rowSums() to get the desired output:

rowSums(data.frame(Vec1, -Vec2[names(Vec1)]), na.rm = TRUE)

Output:

Val1 Val2 Val3 Val4
   1   -8    3   -7 

A note on efficiency

Alternatively, and probably slightly more efficiently, you can create a matrix:

rowSums(cbind(Vec1, -Vec2[names(Vec1)]), na.rm = TRUE)
# ^^ same output

You might be tempted to think that this is less efficient, as a matrix is a memory contiguous array, whereas a data frame is a list of memory-contiguous vectors with copy-on-modify semantics. This means data.frame(Vec1, Vec2) does not copy Vec1 or Vec2, whereas cbind(Vec1, Vec2) copies both.

This is true, but the source for rowSums() contains the line if(is.data.frame(x)) x <- as.matrix(x), so you end up creating a matrix anyway. If your data is very large and you want to avoid copies as much as possible you should use the answer by Andre Wildberg, which at least avoids copying Vec1. However, it's usually better to optimise for readability and (particularly if you might expand this to more vectors) my view is that putting the data in tabular form makes the most sense.

Comments

9

Using match() to Align Indices:

idx <- match(names(Vec2), names(Vec1))  
Vec1[idx] <- Vec1[idx] - Vec2
Val1  Val2  Val3  Val4 
1     -8     3    -7 

1 Comment

For those who wonder, the relationship between %in% and match is "%in%" <- function(x, table) match(x, table, nomatch = 0) > 0.
7

We can use merge with some cosmeticts (not needed)

merge(data.frame(Vec1, n=names(Vec1)), 
      data.frame(-Vec2, n=names(Vec2)), by='n', all.x=TRUE) |>
  subset(select=-n) |> 
  rowSums(na.rm=TRUE) |> # or replace
  setNames(names(Vec1))
Val1 Val2 Val3 Val4 
  1   -8    3   -7 

Didn't know this is about speed. As mentioned in a now deleted comment posted shortly after the question: "use match with names and simplest indexing.". {collapse} should be hard to beat:

Vec1[i] = Vec1[i<-collapse::fmatch(names(Vec2), names(Vec1))]-Vec2  
Val1 Val2 Val3 Val4 
  1   -8    3   -7 

Comments

6
ifelse(hasName(Vec2, names(Vec1)), Vec1 - Vec2[names(Vec1)], Vec1)
# [1]  1 -8  3 -7

EDIT: hasName() replaced %in%.

Comments

5

You can try

> library(data.table)

> Vec1 - nafill(Vec2[names(Vec1)], fill = 0)
Val1 Val2 Val3 Val4
   1   -8    3   -7

Benchmarking

set.seed(0)
n1 <- 1e5
Vec1 <- setNames(sample.int(n1), paste0("v", 1:n1))
n2 <- 1e2
Vec2 <- setNames(sample.int(n2), paste0("v", sample(n1, n2)))

microbenchmark(
  `M--1` = `[<-`(Vec1, names(Vec2), Vec1[names(Vec2)] - Vec2),
  `M--2` = Vec1 |> replace(names(Vec2), Vec1[names(Vec2)] - Vec2),
  `SamR` = rowSums(data.frame(Vec1, -Vec2[names(Vec1)]), na.rm = TRUE),
  `Andre` = {
    Vecs <- Vec2[names(Vec1)]
    Vec1 - replace(Vecs, is.na(Vecs), 0)
  },
  `SAL` = {
    idx <- match(names(Vec2), names(Vec1))
    Vec1[idx] <- Vec1[idx] - Vec2
  },
  `Friede` = {
    merge(data.frame(Vec1, n = names(Vec1)),
      data.frame(-Vec2, n = names(Vec2)),
      by = "n", all.x = TRUE
    ) |>
      subset(select = -n) |>
      rowSums(na.rm = TRUE) |> # or replace
      setNames(names(Vec1))
  },
  `s_baldur` = ifelse(names(Vec1) %in% names(Vec2), Vec1 - Vec2[names(Vec1)], Vec1),
  `Thomas` = Vec1 - data.table::nafill(Vec2[names(Vec1)], fill = 0),
  unit = "relative",
  times = 50L
)

and you will see the base R solution provide by @SAL is the most efficient

Unit: relative
     expr        min         lq       mean     median         uq        max
     M--1   2.513836   2.490961   2.909901   2.534369   2.738404   3.010798
     M--2   2.616862   2.362815   2.629789   2.403732   2.533827   3.291382
     SamR  10.188135   8.827689  10.617243   9.646134  10.883201  16.071042
    Andre   4.191419   4.021243   5.773643   4.326937   5.428873  18.314273
      SAL   1.000000   1.000000   1.000000   1.000000   1.000000   1.000000
   Friede 232.612582 192.581732 180.411511 187.376767 177.789439 123.654579
 s_baldur   7.437139   6.561556   7.051161   7.389931   7.969211   5.450089
   Thomas   4.000498   3.631732   4.174856   3.677764   5.209227   3.736972
 neval
    50
    50
    50
    50
    50
    50
    50
    50

5 Comments

Didn't know this is about speed. Added a {collapse} option to my answer.
Don't think it's about speed but it doesn't hurt to see that the cleanest solution (imo) is also the fastest.
@s_baldur yes, that is true and I agree with your point. The observation from the benchmark is one of the reasons I love base R :D
@ThomasIsCoding I am getting a different result from the benchmark i.sstatic.net/zOf1KQ85.png :/ p.s. not necessarily arguing for my answer, just wondering why the benchmark is this much different.
And my 3rd solution, is just SAL's solution without altering the original vector: match(names(Vec2), names(Vec1)) |> (\(idx) {Vec1[idx] <- Vec1[idx] - Vec2; Vec1})()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.