Puzzle of Conditions in dplyr

Puzzle

I am not sure how do I condition on whether a value in column A appears in column B?

I use the naive way

value A %in% column B

within a mutate function. It workers such that R would search all values in column B. What’s more, it even works within groups. However, I not sure whether it is a luck or a right way to do it?

Example

library(dplyr, warn.conflicts = F)

Data

df <- data.frame(a = c(1,2,3,4,5), b = c(3,2,4,5,6), group = c("a","a", "b", "b", "c"))
print(df)
##   a b group
## 1 1 3     a
## 2 2 2     a
## 3 3 4     b
## 4 4 5     b
## 5 5 6     c

without group

df %>% mutate(c = as.integer(a %in% b), d = as.integer(a == b))
##   a b group c d
## 1 1 3     a 0 0
## 2 2 2     a 1 1
## 3 3 4     b 1 0
## 4 4 5     b 1 0
## 5 5 6     c 1 0

** by groups**

df %>% 
  group_by(group) %>% 
  mutate(c = as.integer(a %in% b), d = as.integer(a == b))
## # A tibble: 5 x 5
## # Groups:   group [3]
##       a     b group     c     d
##   <dbl> <dbl> <fct> <int> <int>
## 1  1.00  3.00 a         0     0
## 2  2.00  2.00 a         1     1
## 3  3.00  4.00 b         0     0
## 4  4.00  5.00 b         1     0
## 5  5.00  6.00 c         0     0
comments powered by Disqus