Posts

Puzzle of Conditions in dplyr

Puzzle I am not sure how do I condition on whether a value in column A appears in column B? I use the naive way value A %in% column B within a mutate function. It workers such that R would search all values in column B. What’s more, it even works within groups. However, I not sure whether it is a luck or a right way to do it? Example library(dplyr, warn.

Cluster Standard Error Using R

sandwich package is great for correcting heteroskedesticity and autocorrelation, as well as for clustering standard error. There are several instructions online talking about how to cluster standard errors in lm and plm setups. All of them mentioned that the degree of freedom needs to be corrected. However, in the newest sandwich version: 2.40, the new fucntion, vcovCL, includes the option of degree correction. Here is the change log of version 2.

`View()` in Rstudio

Rstudio provides view() function. However, it has some restrictions. It slows down significantly as the nunber of rows increases. In addition, the number of rows are capped by 100 in the view panel.

The office introduction of View().link

Tips to avoid mysterious bugs

Don’t use the same for the data.frame and variables in it.

library(data.table)
a <- data.table(a = c(1,2,3,1,2,3), b = c("a", "b", "a", "b","a", "b"))
b <- a[b == "b"]

data.table cannot distinguish which b is. b represents data.table b and the variable b in data.table a.

a[a %in% b$a]
Error in b$a : $ operator is invalid for atomic vectors

Notes on the Constant Term in the Fixed Effect Model

In stata, the fixed effect model(xtreg y x1 x2 x3, fe) report a constant term. The underline estimate is [y{it} - \bar{yr} + \bar{\bar{y}} = a + (x{it} - \bar{x{i}} + \bar{\bar{x}})) \alpha + (\epsilon{it} - \bar{e_i} + \bar{v})] with the constrant (\bar{v}) equals to 0. Details can be found on Stata’s website. In R, plm package does not calculate the some whate artificial1 intercept for within models. see help(“within_intercept”, package = “plm”)↩

Least Square Dummy Variable Regression V.S. Fixed Effect Model

The fixed effect model is much faster than LSDV.

Quick Subset Using data.table

In terms of speed, ,I in data.table is the fastest.

NA Action of Filter of Dplyr

filter in dplyr will drop NAs when we filter a variable bigger or less than certain value.

Hosting pages on Github using Rmarkdown and Hugo

My personal website is created by R package blogdown and Hugo and is hosted on Github. This page summarise my steps to create this site.

Coercing the class of ifelse results

base::ifelse returns NA to the logical type NA. It makes data manipulation problematic when we generate a new variable by ifelse from the existing varialbes with NAs. The following example gives an idea how the class of ifelse resutls changes.

test_fn <- function(x){
  ifelse(x > 0,
         "Positive",
         "Not positive")
}
class(test_fn(1))
## [1] "character"
class(test_fn(NA))
## [1] "logical"

To avoid this problem, we have two solutions:

  1. coerce the class explicitly
  2. Use dplyr::if_else
test_fn2 <- function(x){
  dplyr::if_else(x > 0,
         "Positive",
         "Not positive")
}
class(test_fn2(1))
## [1] "character"
class(test_fn2(NA))
## [1] "character"